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FOREWORD 


The San Francisco Bay and contiguous Sacramento-San Joaquin Delta currently receive wastewaters 
from nearly five million people situated in the nine surrounding counties. This is one of the most 
rapidly expanding metropolitan and industrial areas in the nation. Its water resource provides extensive 
recreational, commercial and industrial benefits, in addition to use for wastewater disposal. 


In October 1966, the State Water Resources Control Board was directed by the Legislature to conduct 
a comprehensive study of the effects of wastewater and drainage discharges into the Bay and Delta 
and to develop the basic features of a comprehensive plan for the control of water pollution. The 
study was to determine the need for and the feasibility of a region-wide waste collection and disposal 
system, as well as to recommend other measures for maintenance of water quality. 


After reviewing a number of proposals for the conduct of this study, the State Board selected, and in 
November 1966 entered into a contract with, Kaiser Engineers as Master Contractor for the study. 
The conclusions and the designs developed during that program were primarily concerned with the 
problems of biostimulation and toxicity. Because the objectives of the program did not include the 
conduct of research, additional follow-on studies were suggested for those two parameters. 


Based on the recommendations of the 1966 Kaiser Study, the State Water Resources Control Board, 
in cooperation with the Department of Water Resources, the Department of Fish and Game, and the 
Sanitary Engineering Research Laboratory at Richmond, designed and implemented a study to evaluate 
the parameter of toxicity with a minor amount of effort on biostimulation. The results of the study 
are presented in this report. 


Volume I presents a summary of the entire investigation as well as the general conclusions and 
recommendations resulting therefrom. Volumes II through VIII contain investigatory findings of discrete 
portions of the work. All volumes are available from the California Department of General Services. 


The conclusions and recommendations are those of the research contractor and do not necessarily 
reflect opinions or policies of the State Water Resources Control Board. This report is publication 
No. 44, VolumelII, in a series of water quality publications of the State Water Resources Control Board. 


This project has been supported and financed in part by the Environmental Protection Agency under 
the Federal Water Pollution Control Act. 
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ABSTRACT 


As part of the Study of Toxicity and Biostimulation in San Francisco Bay-Delta Waters, the data 
collected by the University of California in its Comprehensive Study of San Francisco Bay were re- 
evaluated for the impact of relative toxicity on the species diversity index. The data were depth 
averaged and grouped into quarterly seasons to provide a data base with a sufficient number of com- 
plete observations (between 100 and 200) to warrant a statistical investigation. After testing a number 
of methods, a straightforward exhaustive search was employed, using all possible regression equations 
with up to four variables. The variables were partitioned into two groups: non-controllable variables, 
such as temperatures, chlorides, and sediment variables; and controllable pollution-related variables, 
such as relative toxicity, BOD, and dissolved oxygen deficit. 


For benthic species diversity index (BSDI), the best equation involving non-controllable variables 
is between log BSDI and log chloride concentration. The best controllable variable equation relates 
log BSDI to log chloride and log conservative relative toxicity. In the latter case, the non-controllable 
variable accounted for 35% of the variation and the controllable variable accounted for 15%. If con- 
servative relative toxicity is excluded from consideration, the best controllable variable is log of BOD. 
The variation attributed to log BOD is 10%, with 35% attributed to non-controllable variable. Non- 
conservative toxicity performed poorly in the regression equations. 


For microplankton species diversity index there is no evidence that a statistically significant 
improvement is realized by including any of the controllable factors in the equation. For zooplankton 
species diversity index, a regression equation for using log temperature, log suspended solids, and 
log relative toxicity accounts for 31% of the observed variance, with 9% accounted for by relative 
toxicity. The same equation with nitrate-nitrogen replacing relative toxicity explained 40% of the 
variance with 28% accounted for by the nitrate. 
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I. INTRODUCTION 


There has been a growing concern during the past 
few years over possible relationships between biological di- 
versity and the toxicity of wastewater discharged to San Fran- 
cisco Bay. The object of this study is to quantitatively de- 
fine the significant correlations between species diversity 
indicies and relative toxicity. The results of the analysis 
are presented in the form of regression equations which indi- 
cate the intercorrelations between diversity indices, rela- 
tive toxicity, and other water quality factors. Results are 
presented for benthic animal species diversity index, micro- 
plankton species diversity index and zooplankton species di- 


versity index. 
A. Background 


The present study is the most recent in a series of 
scientific and engineering investigations dealing with the 
toxic effects of wastewater discharges into San Francisco Bay. 


[1] 


During the 1950's, Filice conducted a series of studies 


to qualitatively relate the variations in benthic diversity 


to the quality of the overlying bay water. His studies were 
conducted in relatively local areas of the bay system and 
with limited attention focused on quantitative descriptions 
of water quality characteristics. Filice concluded that there 
were significant differences in species diversity, number of 
species, and biovolume as a function of distance from waste- 
water outfalls. These differences were more pronounced around 
municipal discharges than in the vicinity of industrial dis- 
charges. Also, he concluded that there is no reason to believe 
that natural phenomena would introduce spurious regularity 
into relationships between fauna and wastewater contamination. 
The Comprehensive Study of San Francisco Bay !2! con- 
ducted by the University of California from 1960 to 1964, pro- 
vided extensive information concerning water quality, sediment 
characteristics, waste loadings, and biological health of the 
Bay. The data employed in the present study are derived from 
the University of California (UC) investigation. Species 
counts from the UC study were used to compute diversity in- 
dices for the flora and fauna of the bay ay S¥eiti- Additionally, 
the toxicities of wastewaters entering the bay during that 
study were quantified by means of 48-hour static bioassay 
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A 1969 report by Kaiser Engineers entitled "The 


nw [3] utilized 


San Francisco Bay-Delta Water Quality Program 
the data from the UC study in identifying the effects of 

wastewater toxicity. The study indicated a strong relation- 
ship between toxicity and depression in benthic species di- 
versity index. Further, it concluded that the evidence was 


sufficiently strong to warrant the use of relative toxicity 


aS a primary water resource management tool. 
B. Relative Toxicity 


The concept of toxicity (specifically acute toxi- 
city) is derived from correlations between the strength of a 
wastewater and its toxic effects on marine flora and fauna. 
The acute toxicity of a waste is determined by means of a 
48-or 96-hour static or continuous flow bioassay test, depend- 
ing on the definition employed. Test organisms may vary some- 
what but a species of fish, stickleback, is most commonly 
used. 

A Median Tolerance Limit (TL. 9) is computed from 
the bioassay results. The Theo is the concentration of the 


waste, expressed as a dilution ratio at which one-half of 


the test fish survive the test period. The relative toxicity 
mass emission rate of a waste is defined as the flow rate 


divided by the median tolerance limit expressed as a fraction: 


Q (waste) 


Theo 


Relative Toxicity = 


This amounts to standardizing all waste effluents to a common 
acute toxicity strength as determined by the dilution required 
to obtain 50 percent survival, and regarding this concentration 
as one unit of relative toxicity (i.e., a concentration one 
part per part of relative toxicity would exhibit a The of 
one). 

The University of California study concluded that 
700 million gallons per day of relative toxicity were dischar- 
ged to San Francisco Bay; 56 percent of this was attributable 
to municipal sources; the remainder was discharged by indus- 
Cry. 

Since relative toxicity is defined as a mass emis- 
Sion rate, it is possible to calculate relative toxicity con- 
centration in the receiving water body using mathematical 
water quality models. Knowing the distribution of relative 


toxicity for prescribed flow conditions in the Bay it becomes 


feasible to investigate the possible existance of relation- 


ships between relative toxicity and species diversity. 
C. Species Diversity Index 


The Margalef Diversity Index employed in this study 
is a measure of the flora or fauna species diversity in a sam- 


ple. The concept is derived from a fundamental idea of Infor- 


[4] 


mation Theory, that is, the occurrence of an event, Xey con- 


veys a quantity of information. Shannon showed that a rational 
measure of information content is log (1/p;) for event Xey with 
probability D;- The entropy of the ensemble of events is de- 


fined as the average information content, i.e.: 
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where Py is the probability of event X. occurring: 
and since: 


lim 
p, * 0 (-p, log, p,) = 0 


Geld) 


(Lea) 


lim 


(=p, log, P;) = 0 
Pp, 7 i 


impossible events or constantly occurring events contribute 
nothing to the entropy. 
In biological diversity investigations, the Margalef 


Diversity Index is expressed in a manner similar to [H(p,)]. 


m 
ne 
Species Diversity = ms dX Wedn 
Noi fe N 
pS aL 
where: 
N = total number of organisms 
n; = number of organisms of species i 
m = number of species observed in the 


sample 


The diversity index is an attempt to quantify the notion of the 
multiplicity of species present ina sample. A sample with: 
only one species present has zero diversity, whereas a: sample 
which has an equal representation of many species has a high 


index. 
D. Relative Toxicity and Species: Diversity 


It has. been hypothesizea!?] that a direct relation- 


ship exists between relative toxicity and benthic species 


Clea 
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diversity index once the effects of other non-pollution vari- 
ables are accounted for. Since that investigation was made, 
water quality models have been developed which can be used to 
calculate the distribution of relative toxicity throughout the 
bay system. The purpose of this study is to use these refined 
estimates of relative toxicity in a reevaluation of the hypo- 
thesis that species diversity and relative toxicity are rela- 
ted. Neither of the concepts: species diversity nor relative 
toxicity, were examined, they were regarded as given. The un- 
covering of a possible relationship between the two was the 


principle object of the statistical procedures employed. 


II. SUMMARY AND CONCLUSIONS 


Because of the complexities of the biological phe- 
nomena which contribute to a resulting species diversity index 
(SDI) , statistical procedures are employed which attempt to 
partition the observed variation of SDI into categories. The 
major division is between variations of SDI that are unexplain- 
able using the available data, which are called random varia- 
tion, and those variations of SDI which can be related to varia- 
tions in water quality variables. The latter relationships 
are expressed as regression equations between the significant 
water quality variables and the SDI. 

The available data are reviewed in Section III. Af- 
ter suitable averaging there appears to be a sufficient number 
of complete observations, (between 100 and 200) to warrant a 
statistical investigation. 

The theoretical frameworks which are available for 
uncovering the SDI - water quality variations and assigning 
the remaining variations to random effects are reviewed in 
Section IV. There appears to be no one framework which fits 


the problem at hand. Classical multiple regression analysis 


assumes that the independent variables, in this case the water 
quality variables, are known exactly; i.e., that there are no 
measurement errors. This appears to be a fatal flaw in the 
framework when it is applied to water quality investigations 
Since measurement errors, whether due to analytical difficul- 
ties, sampling procedure, or chance fluctuations, are common- 
place. Analyses based on a framework within which all the 
variables are regarded as random variables (partial correlation 
analysis) is also not appropriate since many variables such as 
temperature and chloride concentration are strongly determin- 
istic. In spite of these difficulties a form of classical re- 
gression analysis is employed as the major statistical proce- 
dure. 

The methodologies available for regression analy- 
sis are discussed in Section V. A major problem in regression 
analysis is the choice of the variables to be used in calcula- 
ting a regression equation. After testing a number of methods, 
a straightforward exhaustive search was employed which examined 
all the possible regression equations employing up to four 
variables (a number found to be sufficient in every case). In 
addition the variables are partitioned into two groups: non- 


controllable variables, such as temperature, chlorides, and 


the sediment variables; and controllable pollution related var- 
iables such as relative toxicity, BOD, and dissolved oxygen def- 
icit (Table 4). Equations are developed which relate benthic 
SDI, microplankton SDI, and zooplankton SDI, to the non-control— 
lable variables alone, and subsequently to the non-controllable 
variable and each controllable variable. 

Since an exhaustive search requires a criterim for 
choosing the "best" equation, an empirical approach is adopted 
based on the goodness of fit of each equation. To guard against 
spurious results the available observations are partitioned 
into two groups, the smaller (33 percent) comprising a randomly 
selected check set which is not used in the computation of the 
regression equations but is used to test the resulting regres- 
sion equations. The best equation: has a goodness of fit; Da- 
sed on percent variance removed by the equation, that is approxi- 
mately the same for both the check set and the set employed to 
generate the equation. This procedure is an attempt to verify 
the resulting equation against data that werenot used in its 
formulation. 

In addition, graphical presentations of the residual 
SDI, the difference between the observed value and that cal- 


culated by the equations being considered, are examined for 
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any trends that may exist either geographical variations or 
with respect to variables not yet used in the equation. This 
procedure produced a series of equations which are best in the 


sense of the above criteria. 


BReeepenthic Species Diversity Index (BSDI1) 


The best BSDI equation involving non-controllable 
variables is between log BSDI and log chloride concentration 
with marginally significant improvement obtained if log sedi- 
ment percent sand and log sediment organic carbon are inclu- 
ded (Figure 10). 

The best controllable variable equation relates log 
BSDI to log chlorides and log conservative relative toxicity 
concentration with again marginally significant improvement 
if log sediment percent sand and log sediment organic carbon 
are added (Figure 20). For these equations, approximately 50 
percent of the variation, as measured by the variance of log 
BSDI, can not be unaccounted for and is assigned to random 
causes. The non-controllable variables account for 35 percent 
of the variation and relative toxicity accounts for 15 percent 


(Piro ure e20)2. 


If conservative relative toxicity is excluded from 
consideration, the best controllable variable is log BOD , with 
the same non-controllable variables as above. The variation 


attributable to log BOD. is 10 percent with 35 percent attribu- 


5 
table to non-controllable variables (Figure 33). 

All other controllable variables are not significant- 
ly able to remove variance in observed BSDI. Specifically, non- 
conservative relative toxicity and ammonia~nitrogen are ineffec-— 
tive. 

With regard to the unexplained random variation an 
estimate based on replicate BSDI samples indicates that approxi- 
mately 25 percent of the total log BSDI variation is attributabie 
to measurement error. 

Based on the form of the equations found, which relate 
log BSDI to log conservative RT and log BOD, it is possible to 
compute the percent change in BSDI to be expected from a percent 
change in either RT or BOD. The ratio of percent BSDI increase 
to either percent RT or percent BOD decrease is estimated to be 
ee Paalas mat g8 Js 

As a further investigation into the possible effect of 
a reduction in either conservative RT or BOD, the estimated ef- 


fect of an 80 percent removal of either of these constituents 


is calculated and presented with the 95 percent confidence limits. 
A 16 percent increase is projected in log BSDI. However, the 
confidence limits for the observed BSDI are quite large (due 

to the 50 percent unexplained variance) and encompass the large 
majority of the data obtained under the condition of no removal 
(Figure 41). Thus, although a significant change in the control- 
lable portion of the variation is projected (Figure 45), an ac- 
tual survey to detect such a change would need to be quite ex- 
tensive. 

A difficult question which is only partially resolved 
in this study concerns the interchangeability of BOD and conser- 
vative relative toxicity as the primary controllable variable 
with regard to BSDI. An attempt at an analysis of this ques- 
tion was made using partial correlation analysis. (The ques- 
tion is not addressed using regression analysis because it is 
not possible to partition the variation to be assigned to each 
variable if a pair of the variables are themselves highly corre- 
lated, as are conservative RT and BOD). The results (Table 14) 
indicate that there is a small but statistically significant 
correlation between log conservative RT and log BSDI even when 
the effects of BOD are removed. However, no statistically sig- 


nificant correlation remains between log BOD and log BSDI if the 
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effects of log conservative RT are removed. This result tends 
to indicate that conservative relative toxicity, and not BOD, 
is the primary variable. However, the results are on the bor- 
der of statistical insignificance and since the primary assump- 
tion of partial correlation analysis is violated by this data 


set, the results are only indications and not definitive. 
B. Microplankton Species Diversity Index (MSDI) 


Non-controllable variations in the MSDI are best ex- 
plained with an equation consisting of temperature, dissolved 
Silica, and log secchi depth (Figure 53). The equation explains 
30 percent of the observed BSDI variance and demonstrates a 
high degree of reliability in its coefficients. 

Relative toxicity and ammonia-nitrogen are the only 
controllable factors that favorably impact the MSDI equation 
(Figures 61, and 62). However, there is no evidence that a 
statistically significant improvement is realized by including 
either variable in the predictor equation. 

No conclusions are made regarding the MSDI residual 
variance (70 percent) left unexplained by the regression equa- 


tions. The variation appears to be gaussian distributed. 
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oe Zooplankton Species Diversity Index (ZSDI) 


The best ZSDI equation involving only non-control- 
lable factors is that between log ZSDI and log temperature 
and log suspended solids with a marginally significant improve- 
ment when log pH is also included (Figure 63). 

The best non-controllable variable is nitrate-nitro- 
gen. A log suspended solids, log nitrate-nitrogen regression 
equation for the ZSDI accounts for 38 percent of the total var- 
iance in the ZSDI (Figure 72). 

A ZSDI equation involving log temperature, log sus- 
pended solids, and log relative toxicity accounts for 3l per- 
cent of the observed variance. The relative toxicity variable 
removes 9 percent of that variance as compared to 28 percent 
for the nitrate-nitrogen variable in a comparable equation. 

There appears to be a strong bias in the residuals 
for the ZSDI equations. Attempts at defining the factors 
responsible for this have been unsuccessful. As in the case 
of the MSM there are insufficient data regarding sampling 
variations to draw conclusions about the nature of the resi- 


dual variance indicated in the analysis. 
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DISCUSSION OF RELATIVE TOXICITY AS A REGULATORY TOOL 


The statistical analysis used the 1960-1964 Univer- 
sity of California data base. Questions regarding measurement 
techniques, species identification, etc., while significant, 
are beyond the scope of the present study. 

The benthic species diversity index is employed as 
one dependent variable which provides a measure of environmen- 
tal quality. That is, the presence of large numbers of spe- 
cies is presumed to indicate high environmental quality. The 
project attempts to statistically relate changes in benthic 
species diversity index to water quality measurements and/or 
variables which could ultimately be employed to establish en- 
vironmental control policies. One of the independent variables 
considered in the analysis was relative toxicity treated as a 
conservative variable. Relative toxicity in the Bays is a 
calculated quantity in contrast to the other water quality 
variables which were measured during the 1960-1964 surveys. 

The results of the statistical analysis indicate 
that the log of the calculated conservative relative toxicity 


can account for approximately 14 percent of the total variance 
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in benthic species diversity index. Extrapolation of the sta- 
tistical results obtained indicates that, for example, an in- 
crease of 0.2 percent in benthic species diversity index would 
be associated with a 1 percent decrease in relative toxicity 
level throughout the Bay. These results suggest that calcula- 
ted conservative relative toxicity can be considered as a can- 


didate variable for use in a regulatory policy. 
A. Discussion 


As a regulatory tool, relative toxicity would be 
unique in terms of attempting to control the subtle influence 
of wastewater discharges on the overall biota, through its in- 
fluence on species diversity index. This unique characteristic 
is associated with the acceptance of benthic species diversity 
aS a measure of environmental quality. Relative toxicity is 
essentially a dilution calculation employing the results of an 
acute fish toxicity test. This variable can statistically ac- 
count for a small but potentially significant portion of the 
observed data. However, no causal linkage between relative 
toxicity and benthic species diversity index has yet been 


postulated. 


Studies on the removal and character of substances 
causing fish toxicity suggest that three. components may con= 
tribute to acute toxicityl[19]. The first component appears 
to be unrelated to measured constituents in primary effluents, 
and it appears to be removed from biologically treated wastes. 
This would indicate that the toxicity is either biodegradable 
or removed by chemical or physical processes. Similar pro- 
cesses would probably occur in the natural environment. It 
would therefore appear that this component of toxicity is non- 
conservative. The remaining acute fish toxicity in municipal 
wastes is related to the concentrations of ammonia and Methylene 
Blue Active Substances (MBAS). Ammonia oxidizes and is a 
readily available plant nutrient. MBAS is generally associ- 
ated with detergents, although many other organic and inorganic 
compounds also can react with the methylene blue. Present 
linear alkylate sulfonate (LAS) based detergents are considered 
to be, at least in part, biodegradable. The alkyl benzene 
sulfonate (ABS) detergents employed in the early 1960s were 
far less degradable and may have remained in the environment 


exhibiting the properties of a conservative substance. 
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It is therefore reasonable to expect that the 
variation in benthic species diversity index would be associ- 
ated with both conservative and non-conservative water 
quality variables. The conservative variable might be the 
detergent ABS used in the 1960s while the reactive variable 
could be considered as acute toxicity removed in the activated 
sludge process or the component associated with ammonia. It 
would be particularly reassuring if the ammonia concentra- 
tions in receiving waters were shown to be related to benthic 
species diversity index. The results of the present analysis 
indicate that conservative relative iitiatne eitae POL 
14 percent of the variance in species diversity index, where- 
as addition of reactive relative toxicity and ammonia does 
not increase the reliability of the correlations generated. 
Partial correlation analysis leads to the same conclusion. 
The studies of municipal waste toxicityl19] did indicate that 
approximately 25 percent of the wastes' toxicity to fish was 
not removed by conventional waste treatment and thus is at 
least partially conservative. 

The results of the present analysis indicate that 


approximately 35 percent and 15 percent of the variance in 


benthic species diversity index could be statistically associ- 
ated with chlorides and relative toxicity, respectively; the 
remaining 50 percent of the variance was not accounted for 

by a consideration of the variables measured. It will, there- 
fore, be difficult to statistically discern the effect of a 
relative toxicity regulatory policy in the San Francisco 

Bay System. 

A review of the information available on the re- 
moval of acute toxicity and the proposed standards for efflu— 
ent fish toxicity indicates that it is often possible-to 
meet relative toxicity criteria with treatment systems re- 


quired to meet other water quality criteria. 


Be Summary 


Based on the results thus far available, the con- 
cept of relative toxicity has not been discredited. Neither 
has an overwhelming weight of evidence been generated to 
demonstrate its scientific validity. The overall concept 
has the desirable objective of attempting to control subtle 
influences on environmental quality employing the benthic 
diversity index. However, there are a number of significant 


technical questions regarding the reliability and sensitivity 


of benthic species diversity index to relative toxicity. 
With the use of benthic species diversity index as a crite- 
rion of a biotic health, it may be very difficult to deter- 


mine if a waste management program is effective. 
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A. Background 


The data set utilized in the present study was col- 
lected between 1960 and 1964 by the University of California 
at Berkeley in fulfillment of a contract between the State 
Water Quality Board and the Regents of the University of Cali- 
fornia. The study area is conveniently divided into six major 
areas as shown in Figure 1. From south to north, these are 
South Bay, Lower Bay, Central Bay, North Bay, San Pablo Bay, 
and Suisun Bay. The 48 primary water quality and sediment 
sampling stations are also shown in the figure. 

During the four-year data collection program, a to- 
tal of 72 sampling cruises were conducted. Table 1 presents 
the cruise schedule for the study area, and indicates the 
number of sampling cruises during each year. The most heavily 
sampled areas are Suisun and San Pablo Bay, each being sur- 
veyed eighteen times. North Bay and Central Bay were sampled 
during only six cruises. Measurements of thirty-five vari- 


ables were numerous enough for consideration in this study. 
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TABLE 1 


UNIVERSITY OF CALIFORNIA STUDY 
MONITOR CRUISE SCHEDULE 


Study Years 


Bay 1960-61 1961-62 #£1962-63 1963-64 
South Bay 9 3 0 0) 
Lower Bay 0 0 9) 12 
Central Bay 0 0 0 6 
North Bay 0 0 6 0 
San Pablo Bay 6 y| 5 0 
Suisun Bay 7 6 5 0 


Note: Number of cruises during each study year is 
indicated. 
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These included 15 aqueous water quality variables, 14 sediment 
quality variables, 3 relative toxicity variables, and 3 species 
diversity indices. The three species diversity indices are 


the dependent variables in this study. 
B. Relative Toxicity Data 


Relative toxicity concentrations were provided by 
the California Department of Water Resources. The source of 
these concentrations were mathematical model simulations for 
twenty-one combinations of net delta outflow and relative 
toxicity decay rate. The net delta outflow conditions were 
Poors O00, 57000 pO ,;000;, 20),000;, 50,000} and 170 ,,.000™%cubic 
feet per second. The relative toxicity decay rates (K) con- 
Sidered were: K = 0.0/day, K = 0.1/day, and K = 0.2/day. The 
zero decay rate corresponds to treating relative toxicity as 
a conservative substance, (i.e., it does not undergo physical, 
chemical, or biological decay as a function of time). 

Average 1960-1964 loadings were applied to all the 
model simulations. The results were incorporated into the 
data base by first assigning an appropriate average monthly 


net delta outflow and a coarse grid model node number to each 


station-date combination in the data set. The relative toxi- 
city concentrations were then estimated by log interpolating 
between modeled flow conditions using the following formula: 
[RT(x,t,) = RE(x,t,)1 Ino(e) 
REGet) = <7 ain lores oC aes + RT(x,t,) (2.19 
where: 
RE(x;t) = relative toxicity concentration 


ml/1l, at node x and net delta 
outflow condition Q(t) 


Q(t) = average monthly net delta outflow 
LDCs 
Q(t) = modeled net delta outflow < Q(t) 
Q(t.) = modeled net delta outflow > Q(t) 
nee = computed relative toxicity con- 
M2 centrations at delta outflow 
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) and Q(ty), res- 
- pectively 
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The optimum method for handling the large quantities 
of data involved in this study proved to be a generalized data 
retrieval and storage system named prs [51 , Its utility is re- 


flected in the variety of data storage modes, error checking 


capabilities, selective retrieval processes, histogramming, 
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sorting, and basic statistical operations available. 

The DRS system was employed principally to store the 
punch card data from the University of California study, while 
eliminating coding errors, invalid FORTRAN fields, and obvious 
outliers. At the completion of this task, there were approxi- 
mately 5,400 station-time-depth observations, and over 68,000 
pieces of data stored on disk files. 

The DRS statistical operations were used to gener- 
ate baywide statistics and histograms for all the variables. 
These results, for the aqueous water quality data, sediment 
data, and species diversity data, are presented in Appendix A. 
The results of this analysis indicate that there is sufficient 
variation in all of the variables to warrant their inclusion 
in subsequent analysis. Furthermore, significant spatial varia- 
tions are indicated by the differences in statistical behavior 
from bay to bay. 

The spatial variations are clearly seen in five-sta- 
tion moving averages of mean station conditions. A graphical 
representation of five-station moving average species diversity 


indices is presented in Figures 2, 3, and 4. The longitudinal 
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trends typify the behavior of most of the water quality and 


sediment characteristic data, as well as the SDI data. 
D. Depth-Averaged Data 


The major difficulty that arises from this data set 
is that it contains very few complete observations, as com- 
pared to the total number of observations available. There- 
fore the data set was depth averaged to increase the number 
of complete observations. In order to check the statistical 
validity of this procedure the DRS system was used to generate 
baywide means and standard deviations of all variables at in- 
cremental depths. These statistics were then compared using 
a student "t" test, with the null hypothesis; Ho: the mean of 
a variable at any depth interval is not significantly different 
from the population mean for that variable. A five percent 
Significance level was used for the test. In most cases, the 
null hypothesis was not rejected and it was concluded that the 
statistical variation of the variable would be preserved ina 
depth-averaging procedure. There were four notable exceptions: 
chlorosity (5), dissolved silica (5), dissolved oxygen (3), and 
percent dissolved oxygen saturation (3). The number in paren- 
thesis refers to the number of bays in which the hypothesis 


was rejected. 
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This behavior is expected, due to dissolved oxygen 
and density stratifications that are known to occur in the sys- 
tem. In light of the study objectives, it was considered more 
important to create complete records through depth averaging, 
than to preserve the depth variations in these four variables. 
Furthermore, depth averaging tends to reduce the measurement 
error in the variables, which is an important benefit. 

The depth-averaged data base consists of 528 obser- 
vation sets. Within this data base, the following breakdown 
occurs with respect to each dependent variable: 

135* complete observations, 
30 variables 
178 complete observations, 
30 variables 


178 complete observations, 
20 variables 


Microplankton SDI 
Zooplankton SDI 


Benthic Animal SDI 


*There are 20 fewer completed observations if aqueous reactive 
phosphorus is included in the data set. 

The benthic animal species diversity index referred 
to here and throughout the report, is that computed using the 
number of animals of each species present in the sample. An 
alternative form of the SDI uses the total volume displaced 


by each species. The latter appears in the University of 


California data base, but does not occur in enough observa- 
tions to warrant its inclusion in the regression analysis. 
One other modification was made in the data base. 
The dissolved oxygen and percent dissolved oxygen saturation 
variables were replaced with dissolved oxygen deficit (mg/1) 
and dissolved oxygen saturation (mg/l), respectively, the 
idea being that DO deficit is a directly pollution related 
variable and percent saturation a purely environmental or 


non-controllable variable. 
E. Quarterly Averaging The Data Base 


After pursuing the statistical analysis on the depth- 
averaged data base with limited success, it was decided that 
by employing time averaging, perhaps on a seasonal basis, sub- 
stantial improvements in the correlations between species di- 
versity and the independent variables might occur. Three 
month (quarterly) averages of the depth-averaged data created 
between 120 and 158 completed observations for each species 
diversity index. The total number of observations was 184. 
All subsequent regression analysis utilized this depth- avera- 


ged quarterly-averaged data base. 


The time space variations of the data are indicated 
in the longitudinal plots of each variable, which are presented 
in Appendix B. Data for the four quarters 1s indicateqam. 
the hexadecimal equivalent of the first month of the quarter. 
Appendix C contains the complete tabular listing of the daca 


contained in the depth-averaged, quarterly-averaged data file. 


IV. MULTIPLE LINEAR REGRESSION —— THEORETICAL FRAMEWORKS 


A..~Entroduction 


The equations for estimating a line of best fit for 
a set of data have been available since they were formulated 
by Gauss and these equations (which are called the normal equa- 
tions) are still the basis for such an analysis. However, there 
are several available theoretical frameworks which incorporate 
various assumptions concerning the statistical nature of the 
data. The surprising result is that for these frameworks the 
resulting estimation equations have the form of the normal 
equations. This fact has tended to obscure the fundamental 
differences in the interpretations of the resulting "line of 
best fit", which in fact depend strongly on these assumptions !®! , 

A series of theoretical frameworks jis discussed be- 
low in the context of their applicability to the estimation of 
suitable equations relating observed species diversity indi- 


ces to observed water guality variables and calculated relative 


toxicity values. 


B. Maximum Goodness of Fit 


The most direct approach to the problem of fitting 
an equation to a set of data is to decide on a criterion which 
measures the goodness of fit of the equation to the data and 
then to determine the parameters of the equation which maximize 


al: 


this criterion In order to proceed with this method both 
an equation and a criterion must be assumed. The simplest 
useful equation is one which expresses a linear relationship 
between the independent variables, Xp1 Xor eeer Kye and the 
dependent variable, y. For the present analysis X,1 Xor seer 
Xy are the water quality variables and y is the species diver- 


sity index. Thus, it is assumed that y is related to x's via 


an equation: 


Y = bo + bjx) + box, pel Some a se 
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where Y is the fitted or estimated value of y and the b's are 
to be found. The simplest goodness of fit criterion, simple 
in the sense that a tractable mathematical problem results, 
is one of minimum sum of squares of the deviation between the 
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fitted dependent variable, Y, and the observed, y. Thuspae 
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Ys is the value of y for the i observation and x1 
th 
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observation of the x's, then it seems reasonable 
“nw 


to choose Bor Die Satay bL such that the fitted values, Ys: 


4 are the i 


+ bix 


ia = bo + bjX13 5 i NrgneeiNe a.) «tit ae ee) 


a1 Teeth 1: 


and the observed values, ag have minimum sums of squares of 
M oe 2 
i=1 


If no other criteria are specified, the equations which estimate 


the residuals for the M observations available. 
the b's can be found by setting to zero the partial derivatives 
of the criteria with respect to each of the b's. Thus for Dee 


Ketel, isi. son F 


3 5 (eee ey ie cee (4.3) 
i> em ta iv am ; 
k  i=1 
Oe 
M x ys 
5 i k 


which, upon substitution of the assumed form of Y (Equation 2) 


and noting that: 


“ise = oa ae (4775) 


becomes: 


M 
ah Dox by X15 %,5 cee bx Xa] = 0 (4.6) 
OG: 
b, Sx, + b,Sx,)x, + ... + by Sxyx, = Syx, (4.7) 
where: 
M M 
SyX, = _ YiX,u 7 SX) = _ X42 (4.8) 
i=l i=l 
and: 
M 
Sx axie~ Se eils & A pGans (4.9) 
Koy ger 1s bas 


so that a set of N + 1 equations result of this form: 


oe SX) + by SX)X gee tos ats Du SX x1 = SyX, 


Dy S81 By SX epee RP Oe ot aay 


Do SX, ta by SX) Xy Api theres. wat: Du SX Xy = SyXy 


These are the well known normal equations which can be solved for 


the b's. It can be seen from the derivation that if any other 


dee 4 


criteria were chosen, e.g., minimize the sum of the absolute 
values of the residuals, the resulting equations for the b's 
would be not simultaneous linear equations as above but simul- 
taneous non-linear equations which are usually very difficult 
to solve. 

The procedure for obtaining the neon of best fit 
using the minimum squares of the residuals criteria is, there- 
fore, straightforward. However, this framework suffers from 
serious drawbacks: 1) the form of the equation is specified 
a priori; 2) no account is taken of the statistical variability 
inherent in the observations used to compute the resulting equa- 
tions. Examples of a direct application of this method to 
the species diversity water quality data are presented in the 
eal eee Although the resulting fit achieved is quite 
good, the resulting equations appear useless. [In particular, 
some coefficients of the pollution-related variables are posi- 
tive, indicating that if these equations are to be believed, 
an increase in pollution-related concentrations will increase 


species diversity index. 


Cc. Minimum Error Variance 


In order to take account of the statistical natuge 
of the data and to be able to make statements about the relia- 
bility of the equations that result, most formulations of the 
regression equations include a random component as part of 
the assumed framework. The typical formulation is as fol- 


[8]. 


lows y is assumed to be a gaussian random variable with 
mean nN and variance oe The mean of y is assumed to be related 
to the observed independent (and not random) variables, XpGe 


via an equation of the form: 
n= 8 + By (x, - X)) to... + By Oxy - xy) (AaB) 


where the Xj1 Kor +++ Xy are known exactly and in no way in- 
fluence the statistical behavior of y other than through their 
influence on the mean of y. It is conventional to subtract 
the average value, Xr of the x, from the Xr where X = 

M 


ha in writing this equation, but it is not assumed that 
1= 
the X,4 are in any way random; they are assumed to be known ex- 


ace Lys, 


a oe 


The problem is then to estimate the 8 coefficients 
which determine the way in which the dependent variables af- 
fect the mean value of y. 

Thus there is a deterministic component of y, de- 
noted by yn, and a random component, v, and the actual value 
of y which results at the Ree observation, Yur is the sum of 


these two components. y. =n 


A 4 Vv where Vv; is assumed to 


i 
be an independent gaussian random variable with zero mean and 
variance wha The deterministic component of y is assumed 

to be a function of the known values of the XG via the equa- 


tion: 


aut Bee S x1) niece ot the (Cte Xy) (4.12) 


The problem is to estimate the g coefficients, which then are 

the estimates of how the deterministic component of y varies 
nN 

with the Xy- Letting Y be this estimate of the deterministic 


component of y (Y is an estimate of n), and by be the estimates 


eur Bye then: 
eal pad ey (x) - X,) SiS sth oe ae (4.13) 


The criterion used to find the b's is to minimize the variance 


of the random component Vi: In other swords, «pick ther bs so 


that the resulting variance of Vs is minimum, thereby forcing 
as much of the observed variations of y into its determinis-— 


tic part, Neth The variance, of Vv; iss 


M 
Peel 2 
Fee ae 2 V5 (4.14) 
i=1 
(mean of Vv, assumed to equal zero). But VSS that: 
M 
ae 6 2 
0 er: 2 (ys n,) (4.15) 
i=l 


and ns is assumed to have the form given in Equation (4.12). 
To minimize this variance with respect to the b's, the partial 
derivatives 30 */8b, are set to zero. The resulting equations 


have the form: 


ri M 
i=l 
by SX) SF be SX) X5 Teiters eet by SX) %Xy = SyX) 
; (4.17) 


by SX Xy a: bo SX\X>5 ti etal een by SX xy ~ SYXy 


The solution of these equations is the minimum variance es- 
timates of the BLS. The remarkable result is that these equa- 


tions are exactly the same as those which come from the least 


SS Ae 


squares formulation which incorporates no statistical compon- 
ent in its framework. However, the addition of a statistical 
component allows one to estimate the effect of the random com- 


ponent on the estimate b). It can be shown hates: 


E [b] = 8 
Pippi ercov sib] = ss ov 
xX 
where b is the vector of elements Dee Th Yet> ye fete [b. ], B = [8,1 
and S is the matrix of elements Sx.x.; oO Z is the variance 
pod cane Vv 
Ofav, ethe statistical partion.ofiy. ci can be estimated from 
the residual variance of y, i.e.: 
M 
2 i: a) 
= = = 2 aver) 
V M N baal a 2 
is an unbiased estimate of Ge 


In order to display the details of equations, the for- 


mula for the two dependent variable case is presented below. 


Bovey eile Vie? = ib + by (X15 - X,) + by (X55 - Xo) 5 
Vv; are normal independent random variables, mean = 0, 


. Y . . s . 
variance = On , the minimum variance estimates of bor 


b and bo are: 


ak 


Aine 


(4.18) 


(4.19) 


(4.20) 


where: 


and: 


In addition it is possible to estimate the variance of the es- 


timate of Y. 


V(x) = V(b.) + (x, - X1) °v (by) + (x, - X5) “V(b,) + 2(x, - x 


i=1 


(Syx, SX5X5 - Syx, Sx, x,)/A 


(Syx, SX) x) - Syx, Sx, x5) /A 


= Sx) xX] SX5X5 - (Sx, x5) 
V(b) =o “/i 
fe) Vv 
V(b, ) = O2 SX5X5 
Vil ba )> =e 6 Zo Sx.xX 
2 V i asl 2 
Cov (b2 ba). S86. 2 Sx.xX 
{ae V Pe 


> ae 


(4.29) 
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(4.21) 


(4,22) 


(4.23) 


(4.24) 


(4.25) 


(4.26) 


(4.27) 


(4.28) 


Thus, a full range of statistical information is available con- 
cerning the variability of the estimates, Die and the effect 


this variation has on the prediction of n based.on the xX). 


WoeeCorre lation Analysis 


In the previous formulation, it is assumed that the 
independent variables, X,, are known precisely, thereby con- 
stituting the deterministic portion of the variation, and all 
the random variation is associated with the dependent vari- 
able, y. In sharp contrast to this is the framework which re- 


gards both y and the x,'s as member of a multivariate joint 


k 


normal population. Then y and the x,'S are random variables 


and each observation is a sample from this multivariate normal 
population. Let X be the vector (y, Xpr Xor veer eA) Lo8 then 
under the above assumption the probability density function 


(pdf) cofuxeiss? 1. 


Pen (2 he ee pa aRe SES.) (4.30) 
where: 
p = N + 1, the dimension of X 
u = E(X), the vector mean of X 
: = E((X - y) (X =e) ile the covariance 


matrix of X 


SS oe 


Letting ie denote the ish? element of 5, then the variance of 


xX. (the i= component of X) is Os and the correlation coefficient 


between xX. and <x. j#pw. 2S: 
i 5 a | 
oe 45 e 
Paihanerke oe eel 
Pas —eo Cage (4.31) 
ore Oras. 1 u: 
11 J) 


The mean and covariance completely specify the pdf 
and, if they are known for a‘population then ‘there is’ no other 
information available concerning its behavior. Thus in this 
framework once u and Y are known it is possible, at least in 
principle, to calculate whatever probabilities are required with 
regard to specific situations. 

The question of estimating the behavior of y given 


the values of the x,'s in this framework becomes a question 


k 
of the conditional .drstributiionS- in particulary if 2a sare 
titioned into two subvectors rea x (2) (the first gq element 


(1) 


of X are X , -ehe "remaining p~g are co then it can be 


(1) (2) 


shown that the expected value of X given values of X is: 
f) (2s 6 Cw) =i (2) + 2C2) 
E(X pax cae +o oo Baek Viger 3) (4,32) 
and the covariance is: 
et (x) f yw) ) (x @) ie yd)? eau a. 25 aa 5 -1 5 (4.33) 


$81 Ge 


where the submatrices are defined as: 


(4.34) 


with Mad a q by gq dimensional matrix, Yard by (p-q) dimension, 


il 


etc. It is conventional to call the matrix £..%.. ° the re- 


(1) (2) 


gression matrix of X 
h 


on X 


the i3° element of 112 then the diagonal element, o 


cole 7 
and letting: 


(2 hsasy erie ©. 


aimee ULrnce One. 
1) 


are the variances of the conditional mean E(X 


Ojjeqtl,...,p) 


Le 2 


1/2 


Git aes Hi Loe 


Lj °gtl, 
(1) ) y(2)) 


(4.35) 


defines the partial correlation coefficient between Xs and Bs 


Bort noge se +t)... ek ofixed. 
q p 
For the special case where 
to predict the first element of X (so 
y) given the remaining p - 1 elements 


the equation for the conditional mean 


the same as the solution to the normal equations. 


note that 2% = 5 


22 and S =a = 


nears yx 2 


= o7 <= 


q = 1 and the question is 


that in this case X 


(2) 


(X a (Xj, Xor eee yp X 


(Equation (4.32)) 


fi 


12 so that: 


(1) 


dk 


i) 


N 


is exactly 


To see this 


=-1 


T cag T aT 
(Zio %o2 


b= ae 


Thus, once again, the same set of equations results. But the 
similarity does not extend to the statistical variation of 
the estimates. For example, the variance of Y = n(x (?) | xf4))) 
the estimate of y given the X'S, is independent of the values 
of the x, 'Si it can be found by specializing Equation (4,33) for 
this case, whereas for classical regression theory the variance 
Y depends directly on their values (see Equation (4.29)). Also, 
in practice the population's vector mean and covariance ma- 
trix are unknown and sample estimates are used so that the 

above estimates are subject to Sampling fluctuations. Ppinipare 
ticular the sampling variation of the partial correlation ices 
efficient can be calculated. This fact is used in the subsee 
quent analysis. However the sampling variation of the regres- 
ston matrix-1S ditflicule toO,obtarn. 

This is again in contrast to the classical regressaon 
formulation where the analogous vector and matrix are just the 
coefficients of the normal equations which happen to have the 
form of sums of the variables and their cross products. They 


are not thought of as random variables and therefore have no 


» = 5S S (4. 36) 


sampling variation. The effect of the randomness of y is to 
make the b's uncertain and this uncertainty can be calcula- 


ted. These similarities and differences are summarized in 


Table 2. 


TABLE 2 


SIMILIARITIES AND DIFFERENCES OF THE AVAILABLE FRAMEWORKS 


Classical Correlation 
Regression Analysis 
1. y has a random component yes yes 
Ze x, are random variables no yes 
3. YY is) found via the normal 
yes yes 
equations 
4°: Vy)eistantfunctiontor X yes no 
5. b, have easily calculated 4 Bs 
statistical behavior Y 


i3 ge ee has known NA yes 


statistical behavior 


*NA - Not applicable to this framework. 
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V. MULTIPLE LINEAR REGRESSION METHODOLOGY 


In the previous section the available frameworks are 
presented within which the investigation of the relationship 
between species diversity index, relative toxicity, and other 
available water quality data can be conducted. The major dif- 
ference in approaches is whether to consider the independent 
variables as known precisely and not random, that is, not 
subject to statistical fluctuations such as measurement errors 
or variations due to unknown (or unmeasured) causes, or whether 
to adopt the purely random variable point of view and use the 
multivariate normal probability density functions as the model 
of the data and draw strictly probabilistic conclusions con- 
cerning the species diversity-relative toxicity relationships. 

Unfortunately neither point of view is strictly ap- 
propriate. The behavior of certain of the dependent variables 
such as chloride concentration and temperature closely approach 
being precisely known, with only a small part of their variation 
assignable to random fluctuations. The availablility of deter- 


Ninistie-mode'ls for these Seriab leases 


supports this contention. 
Certain other variables, such as BOD and dissolved oxygen defi- 


cit, have deterministic components which influence their variation 


- again deterministic models are available - but there are sig- 
nificant random fluctuations as well, which are due to many un- 
known causes. The degree of randomness is also influenced by 
the accuracy and precision of the measurement. Thus suspended 
solids measurements appear much more random than dissolved oxy- 
gen. measurements. 

The most satisfactory analysis would use a theoretical 
framework which takes into account both a deterministic and a 
random component in characterizing each variable. Unfortunately 
such a framework is lacking for any but the simplest situation: 


one dependent variable and one independent variable !+1] 


and gen- 
eralizationsappearrdiftireurt, 

In order to proceed with an analysis within the con- 
text of known frameworks, both points of view are adopted, each 
separately, and the calculations are made accordingly. Multiple 
linear regression is examined in more detail in this chapter 
and applied in the subsequent two chapters to the problem at 
hand; then correlation analysis is further explained and ap- 
plied in Section VIII. However, since some of the basic assump- 
tions are violated in each case, the detailed method of appli- 


cation and the conclusions drawn are necessarily somewhat ten- 


tative. 


A. Classical Multiple Linear Regression 


Classical multiple linear regression is straight- 
forward in its application and when the general form of the 
equation to be fit is known, and the independent variables 
are accurately measured, the results are usually quite satis- 
factory. An example of this type of analysis, applied to the 
formulation of the reaeration coefficient in flowing streams !14!] 
indicates that for well understood phenomena for which the 
relevant independent variables are known and no spurious var- 
iables are included in the analysis, the result is quite re- 
asonable. 

However it is clear that in the present context that 


a major difficulty is in deciding which are’ the relevant varia- 


bles. Therefore a straightforward approach is ruled out. 
B. Stepwise Regression 


A popular variant of classical multiple linear re- 
gression which attempts to deal with this problem of variable 


Peele Forward stepwise regres- 


selection is stepwise regression 
Sion introduces independent variables one at a time, based 
on various criteria of goodness, for each of the variables in- 


troduced. 


A variable xy is introduced if it produces the great=— 
est reduction in the residual variance, Poa and whose coef- 
ficient by is significantly greater than zero (i.e., the t- 
statistic associated with by is significant at 95 percent). 
Backward stepwise regression starts with a full classical mul- 
tiple linear regression equation, and eliminates variables, 
one at a time, based on essentially the opposite criteria - 
variables are removed if they affect ae least, and have in- 
Significant coefficients. 

The results of a stepwise linear regression can be 
quite unsatisfactory since tl criterion for selecting varia- 
bles is arbitrary in the sense that there is no theoretical 
guarantee that a "best" equation will result. An example of 
a stepwise regression applied to the depth-averaged, quarter- 
ly-averaged data with BSDI is given in Table 3. This partic- 


eel of the procedure uses an estimate of the F- 


ular version 
statistic to judge if a variable should be added to or deleted 
from the equation. The first three variables introduced ap- 
pear reasonable. The fourth variable, NO,-N, enters with a 
positive. (0.238)..and significant .(t = 2.65) coefficient. , How= 
ever at the fifth step reactive relative toxicity enters with 


a positive (0.129) and marginally significant. (£.=41.90) 
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coefficient which is clearly spurious. What is happening is 
that the regression is utilizing the difference between con- 
servative RT and reactive RT as a variable to marginally im- 
prove the regression. The difficulty with stepwise regression 
is that the choice of the variables to be included in the equa- 
tions is based ona criterion which in no way guarantees that 
the "best" variables are selected, which guards against a chance 
inclusion of a variable which may not be significant. The 
difficulties inherent in stepwise regression have been mention- 
LL J 


ed by other workers - A major problem occurs if the inde- 


pendent variables are themselves significantly correlated. 


C. Reduced Rank Regression’— Principal Components 


The problem of choosing the "best" equation, (actual- 
ly the "best" variables to include in the equation Since as 
shown in Section IV, the computation of the equation co- 
efficient is the same in whatever framework adopted) has 
been addressed using the techniques of principal component analy- 
Sis and factor analysis applied to the independent variables. 
This technique assumes that the independent variables are ran- 


dom variables which are intercorrelated and seeks to find those 


variables or linear combinations of variables, called factors, 
which account for the largest portion of the total variance of 
the independent variables. These variables or factors are 
then used as the variables in a regression. The hope is that 
the number of significant factors which result is less than 
the total number of independent variables at hand (hence re- 
ducing the dimension of the regression problem) the idea being 
that the actual rank (the rank = the number of linearly inde- 
pendent dimensions in a matrix) of the correlation matrix is 
actually less than the total number of independent variables, 
Neeeetie addition tt each Lactor or dimension is characterized 
by a single variable, or at most by two variables, and there 
are only a few factors, the "best" variables have been found 
and the problem is straightforward thereafter. The details of 
this approach are available in the fireratira eos 

Like stepwise regression, this approach gives a 
method of selecting the appropriate independent variables, 
only in this case, the presumption is that the data are highly 
intercorrelated so that the actual rank of the independent 
variables is significantly less than the number of variables 


available. Figure 5 presents the percent of total variance 
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PERCENT VARIANCE REMAINING IN THE REDUCED RANK MODEL AS 
A FUNCTION OF THE Biel OF THE MODEL 
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remaining versus the number of principal components removed 
for the depth-averaged data set. It has been suggested that 
the reduced rank is reached when only 0.10 percent of the 
variance remains !19! , As can be seen, the data are essentially 
full rank. 

Thus although the independent variables are intercor- 
related, the intercorrelation is not such that some of the 
variables are closely approximated as linear combinations of 


the others. Hence principal components analysis does not pro- 


vide a technique for variable selection. 
D. Exhaustive Search 


The need for deciding which subset of the available 
variables to use can be bypassed by a brute-force technique 
which examines all the possible equations which relate the de- 
pendent variables to all the subsets of the independent variables 
Xpr seer Rye There are oh possible equations and for N « 12 
there are computer programs available which perform this ex- 
haus tive search! 16 , For the BSDI analysis there are 27 var- 
iables with sufficient observations to be included; for the 
MSDI and ZSDI, there are 15 relevant variables. Thus, directly 


exhaustive searches are impractical. 


eS Oe 


Yet this appears to be a worthwhile approach; it 
does not require any criteria for a good variable since it 
tries them all. However, it does require a criterion for 
identifying a good:equation. Based on: the fact that» the re- 
gression problem at hand does not neatly fit any set of theo= 
retical assumptions, the following procedure appears reason- 
able: the search is carried out sequentially for the best 
one variable equation, two variable equation, and so on. 

The best equation is judged on two criteria based on a divi- 
sion of the data into two sets of observations. A portion 

of the data (33 percent) is randomly selected from the avail- 
able data and withheld from consideration. The remaining 

data is used in the calculations of the regression equation. 
The criterion for the best equation is that which has the 
smallest residual variance for the data used, and in addition 
does approximately as well on the withheld observations. 

This criterion is admittedly somewhat subjective but in general 
it clearly points to at least acceptable equations and it very 
effectively screens equations which do well on the data used 
but are not effective on the withheld set. In particular, it 


becomes quite clear that introducing a large number of variables 


into an equation results in rapidly worsening residual vari- 
ance for the withheld set. Figure 6 is an example of such a 
result. The residual variance, ga for both the data used 
and the withheld set are shown versus the number of variables 
in the equation for BSDI. The variables are added based on 
the stepwise criteria discussed previously. 

The conclusion drawn from such results is that ex- 
haustive searches including all ae combinations are not nec- 
essary and that searches restricted to at most five to six 
variables are sufficient. The further evidence for this con- 
clusion is presented in the next section, as well as a detailed 


presentation of the results of the analysis. 
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A. Preliminaries 


In order to apply classical regression analysis or 
its variants and formulate equations which relate BSDI to water 
quality variables, it is necessary that a data set which is 
complete in all the variables be considered. In addition, it is 
prudent to attempt to minimize the randomness of the independent 


variables, in order to conform to the assumptions of the re- 


Xr 
gression analysis method. A straightforward approach is to av- 
erage the available data in a way which appears not to affect 
the underlying relationships. The depth-averaged data already 
have some randomness removed. In addition, since it appears 
that the temporal variations of the BSDI's are not a major fea- 
ture of the data over time spans of months, as shown in Figures 
7, 8, and 9, a three month or quarterly averaging procedure 


appears reasonable. It is this data which ae used in the sub- 


sequent analysis. 
B. Controllable and Non-controllable Variables 


The primary aim of this statistical investigation is 


to relate variations of BSDI's to water quality variables, some 
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FIGURE 7 


BENTHIC SDI VERSUS DAY OF THE YEAR - SUISUN BAY DATA 
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FIGURE 8 


BENTHIC SDI VERSUS DAY OF THE YEAR - SAN PABLO BAY DATA 
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SOUTH BAY DATA 


BENTHIC SDI VERSUS DAY OF THE YEAR 


of which are controllable in the sense that regulatory decisions 
can be enforced by changes in waste discharge practices. Thus 
a convenient division of the water quality variables can be made 
into noncontrollable variables: those variables whose magnitudes 
are not directly affected by waste discharge practices, and con- 
trollable variables: those that are affected. Table 4 presents 
the division adopted. The division of the aqueous variables is 
relatively straightforward. Perhaps the only difficulty is 
chlorosity which can be affected by inflow regulations. However, 
Ine thes main, -chisiiis a- function -ofpnaturalseftects.snthesnsedi= 
ment variables present a different problem since the condition 
of the benthos can be adversely affected by discharge practice, 
e.g., the presence of waste discharges with settleable organic 
material. However, since it is difficult to precisely relate 
these effects to discharge practice, these variables are also 
assigned to the noncontrollable category. Admittedly this is 
somewhat arbitrary, but as is evident from the results presented 
in the next section, the conclusions reached are not affected by 
this assignment. 

The reason: for this division of the independent var- 
iables into two classes is dictated by an objective of the sta- 
tistical analysis: to assess what portion of the observed var- 


iance in the species diversity index is related to uncontrollable 


care Coe) me 


TABLE 4 


CONTROLLABLE VARIABLES 


Aqueous Inorganic PO, MGPO , /L 
Aqueous Total Phosphorus MGPO ,/L 
Aqueous Reactive PO, MGPO ,/L 
Aqueous Nitrate Nitrogen MGN/L 
Aqueous Ammonia Nitrogen MGN/L 
Aqueous Coliform Organisms MGN/100ML 
Aqueous BOD - Five day MG/L 
Aqueous Dissolved Oxygen Deficit MG/L 
Relative Toxicity K = 0.0 
Relative Toxicity K = 0.1 
Relative Toxicity K = 0.2 
NON-CONTROLLABLE VARIABLES 
Aqueous Temperature Centigrade 
Aqueous Secchi Depth Feet 
Aqueous’ pH Units 
Aqueous Suspended Solids MG/L 
Aqueous Chlorosity GM/L 
Aqueous Dissolved Oxygen Saturation MG/L 
Aqueous Dissolved Silica MG/L 
Sediment Temperature Centigrade 
Sediment Percent Sand Percent 
Sediment Percent Silt Percent 
Sediment Percent Clay Percent 
Sediment BOD MG/L 
Sediment Ion Exchange Capacity MG/100GMS 
Sediment Bulk Density GM/ML 
Sediment Mositure Percent 
Sediment Hexane Extractables MG/L 
Sediment Total Sulfide MG/L 
Sediment Total Nitrogen MG/L 
Sediment Total Carbon MG/L 
Sediment Organic Carbon MG/L 
Sediment Ratio Organic C/Total N 


DEPENDENT VARIABLES 


Benthic SDI (BION) 

Benthic SDI (BIOV) 

Microplankton SDI 
Zooplankton SDI 


or "background" water quality variations, and what fraction can 
be attributed directly to variations in controllable variables. 
An approximate analysis of variance can be constructed which 
divides the total observed variance of the dependent variable 
into three components: unexplained variance; explained but un- 
controllable variance; and explained controllable variance. 
For the purpose of establishing regulatory policies aimed at 
maintaining specific levels in the dependent variable, only the 
last component, the controllable variance, is significant. 

The approach taken is to analyze non-controllable or 
background variables. Then each of the controllable factors 
is examined in order to ascertain which has the primary influence 
on the dependent variable. The residual variations are then 
analyzed to ascertain if there are any additional variables 


that are significantly related to the dependent variable. 
C. Equations: Non-Controllable Variables 


The exhaustive search techniques described earlier 
are employed in the development of one, two, and three vari- 
able equations for the benthic animal species diversity in- 


dex (BSDI). Equations using the variables and their logs as 


well as combinations thereof are developed using all 20 non- 
controllable variables. 

The best one variable equation incorporates that 
variable with the highest simple correlation coefficient with 
the BSDI; in this case it proved to be the log of the BSDI and 
the log of chlorosity. Table 5 presents a statistical evalua- 
tion of this equation. The multiple correlation ceene eee 
R = 0.606 indicates that 36.8 percent of the variance in the 


log BSDI is explained by the resulting equation: 


Te22 Oma 3 


BSDT “= 6° chlorides 


The t statistic of 8.46 indicates that the chloride coefficient 
is highly significantly non-zero. For sample sizes used in this 
study ,;1t' = +°1.96are the 95% tcontidence imut forse? coches 
cients of the independent variables. 


The beta coefficients, 8 for the normalized equa- 


kk!’ 
tions are also indicated in the equation summary. These are 
the coefficients for the equation with independent variables 
normalized to zero mean and unit variance. 

The quantity of BL Pane where Pax is the correlation 


coefficient between the dependent variable in the i indepen- 


dent variable, is a measure of the fraction of the explained 


variance due to variable x in the equation !?7! 


The three equa- 
tions presented in Table 5 indicate that the variance explained 
by the chloride variable varies from 31 percent to 38 percent. 
The two and three variable equations are the result of exhaus- 
tive searches of all remaining possible variable combinations. 

A measure of the goodness of each equation is ob- 
tained by testing the predictive capability of the equation 
on a set of data not employed in its development. Thirty 
three percent of the data set is aside for purposes of check- 
ing the equations. To qualify as a candidate for inclusion 
in the equation, the multiple R squared, RO for the equation 
and the check data set should be maximally improved and roughly 
equal (within 10 percent). Sediment percent sand and sediment 
Organic carbon both satisfy these conditions. 

Figure 10 summarizes the three best non-controllable 
equations for the BSDI. The inclusion of the sediment percent 
sand and sediment organic carbon in the equation result in 
only marginally significant changes in ree It can be shown 
that for the number of observations used in this analysis a 
statistically significant change in Ris is greater than 5 per- 
cent 18) , Under this condition the first equation is an ade- 


quate representation of the benthic animal species diversity 


index variation due to uncontrolled environmental factors. 


TABLE 5 


Agueous - Chlorosity gm/1 

Standard Error of Estimate 0.4286 9 

Multiple Correlation Coefficient (R) (0.760047 "(R= 0" 3673) 

Goodness of Fit, F (1, 128) Td e563) 

Constant Term -1.2224 

Coeff. Std.Dev. > f Value Beta Coef. Beta Crit.R. (Betas 
Coete- 

025557 0.0654 3.4595 0.6064 OU; 367 3 0.6064 0.3678 

Aqueous - Chlorosity gm/1 

Sediment Percent Sand percent 

Standard Error of Estimate 0.4132 3 

Multiple Correlation Coefficient (CR) 0.635% “(R 07 40°72) 

Goodness, of Fit, F& (2, ol27) 43.6244 

Constant Term -1.8934 

Coett. Std.Dev. T Value Beta Coef. Beta“ Crit.R. Beta 
Coeff. 

On 0.0619 9.2065 0.6390 0.4083 0.6000 0". 3035 

Of IFAS 0.0540 3:61 £79 0..2205 0.0487 OT Oe 0:..023¢6 

Aqueous - Chlorosity gm/1 

Sediment Percent Sand percent 

Sediment Organic Carbon mg/l 

Standard Error of Estimate O30 09 2 

Multiple Correlation Coefficient (R) 0.6683 (R 0.4466 

Goodness of Fit, F (3, 126) 33... 90 5.5 

Constant Term ~l.4738 

Coeff. Std.Dev. T Value Beta Coef. poe Crit.R. Beta 
Coeff. 

Ce Wave 0.0686 6.8667 OS 2076 Of 2736 0.6000 0, 3867 
0.1246 0.0546 22:8 ls 0.602 O025 7, Dee LOT), 0.0173 
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It is interesting to observe the trade offs that 
exist in the percent of BSDI variance removed by each vari- 
able in the equation. These are also reflected in the t sta- 
tistics for the coefficients in the last two equations. The 
reliability of the coefficients decreases as more variables 
are included. This contention is borne out by the fact that 
by introducing the fourth best variable, sediment total nitro- 
gen, into the equation the t statistic for its coefficient 
falls below the 95 percent confidence limit. This, coupled 
with the fact that it contributed only minimally to an increase 
in Rae effectively eliminates it and any other variable from 
consideration in the equations. 

In the development of the equations, it is necessary 
to assess their adequacy in removing variance in the BSDI ob- 
servations. Furthermore, all independent variables not yet in 
the equation should be evaluated in the light of their capacity 
for explaining the residual variance. This can be accomplish- 


ed by graphically analyzing the residual BSDI defined as: 
Residual BSDI = eas a BSDI 


where: 


BSDI — observed BSDI 


ES, 


(6749 


Ee = predicted BSDI 


Figure 11 shows a graph of the predicted versus the 
actual BSDI for the log chloride - log BSDI equation. It ap- 
pears that the chloride prediction equation is somewhat re- 
stricted in its ability to predict extremely high or low BSDI's. 

The variations in the data set are composed of two 
parts. The first, deterministic variance, is that due to varia- 
tions in the independent variables which is removed by a multi- 
ple linear regression analysis. The second part of the variance 
is that due to measurement error and other unexplainable varia- 
tions. To test the assumption that the residual BSDI are 
normally distributed the residuals are normalized and their 
goodness of fit to a univariate normal probability density 


function is examined using the statistic; [18] 


where: 


Ge = observed number of residuals 
which occur in the i decade of 
the normal probability density 
function 
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e. = theoretical number of observations 
an) thes i decade 


(e; = n/10) 

x? = Chi-square variable with n - 1 
degrees of freedom 

n = number of observation 


which is distributed approximately as a chi-square random var- 
iable with n - 1 degrees of freedom. 

Figure 12 presents the results of this analysis for 
Equation 6.2. The x? statistic with 9 degrees of freedom that 
results is 12.84 which is significant at the 17.0 percent 
level, but not at the conventional 5 percent level so that the 
unexplained residual variations are indeed normal random var- 
lables as assumed. A normal probability plot of the residuals 
is also presented in Figure 12. Ordered random variables from 
asnormal pdf plot as a straight line. 

There may still be underlying relationships between 
the residuals and any of the other independent variables. 
Therefore, the residual BSDI are plotted against each of the 
variables which may be significant, as well as versus the or- 
dered sampling station. The latter plot is designed to un- 


cover any geographical relationships which may be significant. 
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A variable for which a significant trend appears in the graph 
indicates a residual correlation. Figure 13 shows this relation- 
ship. The chlorides, which vary spatially, have apparently re- 
moved most of the spatial variation in the BSDI. However, in 
the South Bay there is still a tendency to overestimate the 
benthic SDI. 

Figure 14 shows the residuals plotted with respect 
to chlorosity. The purpose of such a plot is to uncover any 
nonlinear relationship between log BSDI and log chlorosity. 
The random scatter indicated that no such relationship exists. 

Figures 15 through 19 show the correlation existing 
between the BSDI residuals and the log of sediment percent sand, 
sediment organic carbon, dissolved oxygen deficit, conservative 
relative toxicity, and BOD, day. Note the apparent lack of sig- 
nificant correlation in the DO deficit and BOD. variables. By 
contrast, there are negative correlations still existing between 
the residuals and conservative relative toxicity and sediment 
Organic carbon, as well as a noticeably positive correlation 
with the percent sand. 

An exhaustive search of all remaining noncontrollable 
variables established that percent sand and the sediment organ- 


ic carbon were the two variables which removed the most variance 
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in the residuals and their inclusion in the equation effectively 
randomized the residuals which respect to any other noncontrol- 


lable variable. 
D. Equations - Controllable Variables 


The significant noncontrollable variables in the 
data set with one, two, and three variable equations have 
been found in the previous section. The same exhaustive 
search technique can be applied to the controllable variables. 
The most significant controllable variable in the data set 
is the log of the conservative relative toxicity. When it 
is entered into any at the three noncontrollable equations 
an additional 10 percent of variance is removed from the BSDI 
data set. 

Table 6 and Figure 20 presents a statistical summary 
of the three best equations containing a controllable factor; 
log of conservative relative toxicity. The toxicity variable 
accounts for a 0.10 increase in Bis while itself accounting 
for 15 percent of the total variance in the data. Also the 
equation coefficients are significantly non-zero as indicated 


by the calculated t statistics. 
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SED, % SAND == 


SEO ORGANIC CARBON 
c cagson (TTT 


The best (in the sense outlined previously) of the 
three equations is that composed of log chlorides, log percent 
sand, the log conservative relative toxicity. This equation 
accounts for 51 percent of the total variance in the BSDI. 

An improvement of only 2.2 percent in percent variance re- 
moved occurs if even the best noncontrollable variable, sedi- 
ment organic carbon, is added to the equation. 

As indicated previously, an excessive number of 
variables in an equation can reduce its predictive capacity. 
This is the case when controllable variables such as dissol- 
ved oxygen deficit, BOD., and nitrate-nitrogen, and uncontrol- 
lable variables, such as suspended solids, and DO saturation 
are introduced into the equation. In each case the predictive 
ability of the equation is reduced as indicated by the 33 per- 
cent check set and reduced t statistics for the equation coe 
efficients. 

Sediment total nitrogen, temperature, and sediment 
moisture also affect the predictive capability of the equation. 
In addition their inclusion in the equation causes a redistri- 
bution of the fraction of total variance removed by other 


variables. For example, if sediment temperature is introducced 
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into the equation it effectively changes the proportion of 
variance explained by relative toxicity and chlorosity by l 

to 2 percent. The variance attributed to the temperature 
factor is 3.9 percent but there is no net significant increase 
in total variance removed. 

The residual analysis for two equations involving 
conservative relative toxicity will be presented: the BSDI - 
chlorides - relative toxicity equation, and the equation which 
results with the addition of sediment percent sand and sediment 
organic carbon. 

The predictive ability of the former equation is 
shown in Figure 21, a plot of predicted versus observed BSDI. 
Again the equation tends to overpredict BSDI at higher BSDI 
but the improvement over the BSDI - chloride equation is no- 
ticeable (compare Figures 11 and 21). 

The x? test for the residuals is presented in Fig- 
ure 22. The residuals are significantly non-gaussian (the 
test statistic is at the 0.4 percent significance level). 

This can also be seen from the normal probability plot in 


Figure 22, as a deviation from a straight line. 
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Possible non-linear relationships are ruled out by 
the results of Figures 23 and 24, plots of the residuals ver- 
sus log chlorosity and log conservative relative toxicity, res- 
pectively. No significant geographical distributions appear 
as shown in Figure 25. A slight negative correlation between 
the residuals and log sediment organic carbon (Figure 26) and 
a more pronounced positive relationship for log sediment percent 
sand (Figure 27) still remains. No other relationships appear 
to be present with controllable variables, as shown in the plots 


of the residuals versus log BOD. (Figure 28), and log DO deficit 


5 
(Figure 29). 

The residuals which result from the inclusion of log 
sediment percent sand and log sediment organic carbon are al- 
most normally distributed as shown in Figure 30. The test 
statistic is at the 3 percent significance level and the nor- 
mal probability plot shows that the deviation from normality 
is being caused by a few points only. The plot of predicted 
versus observed BSDI (Figure 31) shows an improvement over 
the previous equations (compare to Figure 1l and 21). There 


is still no geographical dependence of the residuals (Fig- 


ure 32) and all other residual plots show no significant 
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FIGURE 33 


SUMMARY OF BENTHIC ANIMALS SPECIES DIVERSITY EQUATIONS - 
NON-CONTROLLABLE FACTORS AND BOD. 


Oke sre 


trends remaining. Thus, this equation is the best that can 
be done in describing the relationship of BSDI to conserva- 
tive relative toxicity and other non-controllable factors. 

An interesting feature of the equations which include 
relative toxicity is that in all cases the reactive relative 
toxicity, having decay rates of 0.2 and 0.2 per day, are snee 
effective variables in removing variance. This is true for 
linear as well as log equations. The largest non-conservative 
relative toxicity contribution to an equation is only 6 per- 
cent total variance removal. This occurs ina log BSDI = 
linear reactive relative toxicity (K = 0.2) equation. 

If conservative relative toxicity is excluded from 
consideration, the best controllable variable is BOD... Table 
7 and Figure 33 present the statistics for the equation re- 


sulting from the introduction of log BOD The total variance 


5° 
removed by the three equations ranges from 42 percent to 51 
percent. Log BOD, accounts for about 7 percent of this vari- 
ance. Log chlorides is still the major contributor to the 
equation, 


Other controllable variables that contribute posi- 


tively to the equation are log ammonia-nitrogen, log nitrate- 
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Aqueous - Chlorosity gm/1 
Aqueous - BOD 5 day mg/1 
Standard Error of Estimate 0.4110 2 
Multiple Correlation Coefficient (R) 0.6507 (R 0.4234) 
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Aqueous —- Chlorosity gm/1 
Aqueous - BOD 5 day mg/1 
Sediment Percent Sand percent 
Standard Error of Estimate 0.3934 2 
Multiple Correlation Coefficient (R) 0.6898 (R 0.4759) 
Goodness of Fit, F (3, 126) 36662 77. 
Constant Term =19255 
Coeff. Std.Dev. T Value Beta Coef. Beta Critsike Beta R 
eoert. 
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Aqueous - Chlorosity gm/1 
Aqueous - BOD 5 day mg/1 
Sediment Percent Sand percent 
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Standard Error of Estimate 0.23827 2 
Multiple Correlation Coefficient (R) 0..7R29F% OR 0.5082) 
Goodness of Fit, F (4,. 125) Sees 
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nitrogen, and log dissolved oxygen deficit. Log ammonia-nitro- 
gen is the next most important factor, contributing about 3 per- 
cent total variance removal, while dissolved oxygen deficit ac- 
counts for less than 1.6 percent in all cases. 

A geographical summary of the log chlorides, log BOD ,.~ 
BSDI equation is presented below. Figure 34 shows the residual 
spatial variation in the BSDI. Not much trend is shown in this 
plot which indicates that most of the spatial variance is already 
removed. Figures 35 and 36 shows the residuals versus conserva- 
tive relative toxicity and sediment organic carbon. Note that 
even with BOD , in the predictor equation there is still a no- 
ticeable correlation existing between the residual BSDI and log 
conservative relative toxicity. A similar but less striking cor- 
relation exists with sediment organic carbon. 

The residuals appear to be normal random variables as 
can be seen from Figure 37, a presentation of the histogram re- 
sults and the normal probability plot. 

It has already been noted that between 46 and 63 per- 
cent of the total BSDI variance is nhexpieiuea by the equations 


presented. By employing replicate sample BSDI's computed on 
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data collected during 1970 and io] eae it was possible to ob- 
tain a reliable estimate of the sampling variance associated 
with such measurements. 

Fifty nine replicate samples collected in 5 areas 
of the San Francisco Bay system were analyzed. Between three 
and five determinations were available for each sample. The 
data was employed in computing the maximum likelihood estimate 
of the sampling variance which is computed by averaging the 
sums of the squares of the deviations of the replicate sample 
SDI's from the sample mean. The sampling variance of the log 
of the BSDI was computed to be 0.081 which indicates that 25.8 
percent of the total variance in log BSDI is due to sampling 


variations. 
E. Interpretation 


A summary of the nine principal BSDI predictor equa- 
tions has been presented in Figures 10, 20, and 33. The fraction 
of total variance attributable to each component variable is in- 
dicated as are the final predictor equations. The multiple cor- 
relation coefficients and the percent variance removal of each 


equation is also shown. The components of the BSDI variance that 


S2i.3° = 


can be attributed to the three categories of concern is shown 
in Figure 38. The equations represented are those containing 
three non-controllable factors and the two primary controllable 
factors, conservative relative toxicity and BOD,. Note that 
the variance associated with the relative toxicity component 
is considerably larger than that due to BOD, , while the random 
component of the BSDI is essentially the same in each case, 
about 50 percent. 

In order to relate changes in BSDI to changes in rela- 
tive toxicity (RT) these equations can be analyzed as follows. 


The basic equations under consideration all have the form: 


BSDI =A (Rt) 8 €6.. 3) 
ors: 
B 
BSDE ss (BOD,) (6.4) 
Ors 
log BSDI = A + \B log RT (6.5) 


where A contains all the controllable factors in the equation 


and the B coefficient is regression coefficient found for either 
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rélative’ toxicity or BOD... This formulation of the equation 
stresses the fact that control will affect only controllable 
factors. Differentiating both sides of Equation (6.5) with 
respect to log relative toxicity yields: 


Gd “GloqusopL). 


Or: 


Gue(BED Lew owe culC RE) 
BobDd ae RT (6.7) 


since d log-u = du/u. 

Equation (6.7) states that for a given fractional change 
in relative toxicity the resulting fractional change in BSDI is 
given by B. 

Figure 39 is a plot of the B coefficients for conser- 
vative relative toxicity for one through four noncontrollable 
variable equations. The plot indicates the B is reasonably con- 
stant, approximately 0.2%(*70)05)> for “the equations developed 
in this study regardless of the data set used in their develop- 
ment. This conclusion is further strengthened by the analysis 
presented in Table 8. The coefficients of the BSDI equation 


were developed using four different data sets. The relative 
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IN TABLE 


B VALUES x TAKEN 
FROM OTHER EQU- 
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B COEFFICIENT PERCENT CHANGE IN BSDI / 1O% CHANGE IN RELATIVE TOXICITY 


NUMBER OF NON-CONTROLLABLE VARIABLES 
IN THE EQUATION 


FIGURE 39 


VARIATION IN THE B COEFFICIENT OF THE CONSERVATIVE 
RELATIVE TOXICITY VARIABLE 
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LOxPcuty: (B.) coefficient is approximately constant indicating 
that there is little spatial variation in the BSDI-relative 
toxicity relationship. Thus the ratio to be expected of per- 
cent change in BSDI to percent change in conservative RT is 
approximately 0.2 (+ 0.05). 

A similiar plot (Figure 40) for the B coefficients 


of BOD gives virtually the same result, that the ratio of 


pee 
percent change in BSDI to percent change in BOD, TSe 0% 2 ele 0.05) 7 
Thus it appears from this analysis that small percent changes 

of either conservative relative toxicity or BOD . will not cause 
pronounced percent changes in BSDI and in fact a 5 to l ratio 
appears to exist between conservative relative toxicity or BOD. 
percent changes and BSDI percent change expected. 

In addition to the above analysis, based on essentially 
the coefficient of the controllable variable and its standard 
error, it is possible to use the fact that the uncertainty as- 
sociated with the regression equation is known to establish 
what the result would be, based on the regression analysis ob- 
tained in this study, of various control policies that might 


be instituted. The regression equations chosen for these cal- 


culations are the log BSDI - log chlorosity and either log 
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VARIATION IN THE B COEFFICIENT OF THE BOD 5-DAY VARIABLE 
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conservative relative toxicity and log BOD For these two- 


5° 
variable equations the variance associated with the prediction, 
Y, Of the deterministic component, n, is given by Equation 
(4.29). Therefore, the 95 percent confidence limits for Y 


are calculated as follows: 
Y SOY e+ ©L59 6 -%V es (6.8) 


In addition it is possible to calculate the 95 percent confi- 
dence limits for future observations of BSDI, y, since by as- 
sumption y =n +v. The best estimate that is available for the 
deterministic component is Y: the random component, v, will pre- 
sumably be present in any future observations so the variance 

to be expected in y is V(y) = V (Y + v) = v(y) + V(v) assuming 
v is independent of oe V(v) can be estimated as the variance 


remaining, thus: 


v(v) = Cy? (1 - R,7) (6.9) 


so that the 95 percent confidence limits for future observa- 


tions sets is given by: 


Yo, = ¥ 4.1.95 [v(y) + v(w) 177? (6.10) 
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Plots “of -Y and Yoy, versus the sampling stations, for 
each quarters data are presented in Figures 41 to 44 for the 
log BSDI - log chlorosity - log conservative RT equation. The 
observed values of the independent variables are used to cal- 
culate Y and Yor: They are compared to the observed BSDI 
data (the curves for x and Yor, are smoothed since they vary 
a little from station to Station). As can be seen- the date 
does in fact lie within the confidence limits for the most 
part. To test the effect of an 80 percent relative toxtcues 
removal policy, a similiar calculation, using 20 percentio. 
the observed conservative RT, is shown in Figure 45 for the 
third quarter data. The projected change in the determin- 
istic component is shown as well as the 95 percent confidence 
limits to be expected for observations of BSDI under the 80 
percent removal policy. The present deterministic component 
and the 1960-1964 data is included for comparison purposes. 
Approximately a 16 percent improvement in BSDI is predicted 
but the confidence limits are so large that a subsequent sur- 


vey to verify this improvement would necessarily have to be 
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quite extensive to verify the improvement. Note that the 1960- 
1964 data for the most part falls within the confidence limits 
although it is clearly biased lower than the projected mean. 
Figure 46 presents the confidence limits for the deterministic 
component only. Thus it is projected that under this policy 
a Significant change would be affected. 

A similiar analysis is presented for an 80 percent 
BOD, removal policy in Figures 47 through 52 with virtually 


identical results. 
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VII. MICROPLANKTON AND ZOOPLANKTON SPECIES DIVERSITY INDEX 


Predictor equations relating the microplankton and 
zooplankton species diversity indices to aqueous water qual- 
ity variables are developed using the methodology employed 
for the benthic animal species diversity equations. 

The first step of the analysis is to identify those 
non-controllable variables which explain the greatest percent 
of variance in the species diversity index. The subsequent 
addition of controllable variables to the equations then in- 
dicates the relative effect of pollution related variables. 

For purposes of the microplankton and zooplankton 
data analysis the choice of non-controllable variables is 
somewhat limited. Both classes of organisms are highly mo- 
bile in an estuarine environment owing to either their own 
physical functions or, to the transport mechanisms in the es- 
tuarine environment. This being the case it is not reasonable 
to expect that species diversity would be significantly cor- 
related to characteristics of the bay's sediment. Correlations 


that-do exis tjareicsmally 


= 1560 


The equations are developed using the same analytical 
framework used for the benthic animal equations. The quarter- 
ly-depth averaged data base is used. A set of data consisting 
of one-third of the total observations is withheld from the 
correlation analysis to be used as a check on the resulting 
equations. The residual SDI's for each equation are then anal- 
yzed both graphically and analytically to discern any latent 
trends that might still exist. Where a residual correlation 
is indicated its significance is tested by including the vari- 


able in the equation. 
A. The Microplankton Species Diversity Index Equations 


A summary of correlations existing between the mi- 
croplankton species diversity and the aqueous non-controllable 
variables is presented in Table 9. The strongest single re- 
lationship is that between linear MSDI and log secchi depth. 
The multiple correlation coefficient for the equation indica- 
tes that it removes 16 percent of the total MSDI variance. 

The subsequent addition of temperature and dissolved silica 
to the equation results in a 7.5 and 6.8 percent increase in 


the percent of accountable non-controllable variance. The 


= UST 


TABLE 9 


Aqueous - Secchi Depth fee 
Standard Error of “Estimate 053303 3 
Multiple Correlation Coefficient R 0.3990 (R 0.15925 
Goodness of Frt, Fat. 25) 29-5356 
Constant Term 1.2646 
CocmE Std Dev T Value Beta Coef Beta Crt ae 
Coef£ft 
0.2084 Oc0383 Sets 50 0.3990 0 &b5 92 0.3990 
Aqueous - Temperature Centigrade 
Aqueous - Secchi Depth Feet 
Standard Error of Bstaimaiees Oecd L62 3 
Multiple Correlation Coefficient O.43435 (CRe UncoaG. 
Goodness,.of -Fit,. F.(2,.<155) 2 Set 45 
Constant Term er aes 
Coeff Std Dev T Value Beta Coef Beta Cries 
Coeff 
-0.0330 0.0084 -3.9073 -0.2799 00.183 -9.1916 
Ou2 3560 020674 6.3299 07:45 34 0.2056 0. 3990 
Aqueous - Temperature Centigrade 
Aqueous - Dissolved Silica mg/1 
Aqueous - Secchi Depth feet 
Aqueous - Dissolved Silica mg/1 
Standard “Error -oLt Estimate OF30SS 3 
Multiple Correlation Coefficient 0.5502 ( 0.30273 
Goodness *0of Fit - Frere 153) 16.6091 
Constant Term 2..549,7 
Coeff Std--Dev T Value Beta Coef pee Crit. ke 
Coeff 
-0.0430 0.0089 -4.7878 -0.3644 OnwsSZs -0.1916 
Ores: /-4 O20 F458 Doe 0.4963 0.2464 =) 2515 
O02 E257 0.0529 223736 0.2406 070579 0.3990 
-0.4726 On 2 85 -3.6760 -0.7800 0.6084 =) 3355 
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Beta R 


0.1592 


Beta R 


0.0536 
0.1809 


Beta R 


0.0698 
=0. 1248 
0.0960 
OO, 26L7 


linear and log transformed dissolved silica are both inclu- 
ded in the equation to account for a significant non-linear- 
ity existing in the data. 

For each of the three equations the t statistic shows 
better than a 95 percent confidence in the B coefficients and 
tne goodness of fit is Significant at the 0.5 percent level. 
Figure 54 shows the relationship between observed and predic- 
ted MSDI using the log secchi depth-MSDI equation. 

Figure 55 shows a normal probability plot of the 
MSDI residuals. The hypothesis that the residuals are randomly 
distributed is significant at the 63 percent level. That is, 
the residuals are indeed a random gaussian variable. This 
does not mean that there is not other bias in the data that 
is contributing to the residual variance, it merely indicates 
that the net amount of explainable variance remaining is due 
to factors that in aggregate appear to be randomly distribu- 
ted. Plots of the residual MSDI versus the distance factor, 
temperature, log secchi depth, and dissolved silica are pres- 
ented in Figures 56 through 59. The inclusion of secchi depth 


in the equation has effectively removed its contribution to the 


ia OSHS) 


SECCHI DEPTH [__] TEMPERATURE = 
DISSOLVED sILICA[] [[] 


NON-CONTROLLABLE 
FACTORS 
OF MSDI VARIANCE EXPLAINED 


“Yo 


1.265 + 0.208 Log Secchi Depth 
1.735 + 0.237 Log Secchi Depth - 0.033 temperature 


2.550 + 0.126 Log Secchi Depth - 0.043 temperature 
+0.037 dissolved silica - 0.473 log dissolved silica 


FIGURE 53 


SUMMARY OF MICROPLANKTON SPECIES DIVERSITY EQUATIONS 
NON-CONTROLLABLE FACTORS 
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MSDI data. There are still however, significant trends exis- 
ting between the residual MSDI, temperature, and the distance 
factor. Temperature and dissolved silica are therefore inclu- 
ded in the equation. There is no other non-controllable var- 
iable in the data set that significantly improves the equation. 

The randomness of the residuals is not altered by 
_ the addition of these last two variables. The x? test indi- 
cated that the fit to a gausSian distribution is still signi- 
ficant at the 50 percent level. These results are shown in 
Figure 60. 

The addition of the best controllable variable, con- 
servative relative toxicity, to the equation accounts for only 
minimal improvement in the equations and still yields a rte sta- 
tistic significant at the 53 percent level. The second best 
controllable variable is ammonia-nitrogen. It's effect like 
that of conservative relative toxicity is insignificant for the 
sample sizes employed in this analysis. A review of these equa- 
tions is presented in Figures 61 and 62 and Tables 10 and ll. 

The controllable variables contribute minimally to 
the predictor ability of these equations and the total vari- 


ance removed does not exceed 32 percent. It appears that there 
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= 1.325 +0.179 Log Secchi Depth - 0.082 ammonia-nitrogen .419 


= 1.788 + 0.209 Log Secchi Depth - 0.033 temperature 


-0.078 ammonia-nitrogen 


= 2.534 +0.116 Log Secchi Depth - 0.042 temperature 
+0 .036 dissolved silica -0.450 log dissolved silica 


-0.050 ammonia-nitrogen 


FIGURE 61 
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1.348 + 0.165 Log Secchi Depth -0.0013 relative 
toxicity 

1.737 + 0.200 Log Secchi Depth - 0.029 temperature 
-0.00097 relative toxicity 


2.488 + 0.106 Log Secchi Depth - 0.039 temperature 
+0 .033 dissolved silica - 0.428 log dissolved silica 
-0.00071 relative toxicity 


Uk 6 
SUMMARY OF MICROPLANKTON SPECIES DIVERSITY EQUATIONS 
NON-CONTROLLABLE FACTORS AND CONSERVATIVE RELATIVE TOXICITY 
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TABLE 10 


Aqueous - Ammonia-Nitrogen mgn/1 
Aqueous - Secchi Depth feet 
Standard Error of Estimate 053281 5 
Multiple Correlation Coefficient R Oth 02 oe (ae 01/58) 
Goodness of Fit, F (2, 155) T6Ht5510 
Constant Term Le3255 
Coeff Std Dev T Value Beta Coef Beta CrrAerR Beta R 
Coeff 
-0.0816 020464) -1.7666 -0.1405 0.0197 -0.2775 0.0389 
074491 0.0415 AS SLOS O03 3428 O75 0.3990 0.1368 
Aqueous —- Temperature Centigrade 
Aqueous - Ammonia-Nitrogen mgn/1 
Aqueous —- Secchi Depth feet 
Standard Error of Estimate 043141 3 
Multiple Correlation Coefficient R 0.4996 (R 0.2496) 
Goodness of Fit, F (3, 154) PSO St 
Constant Term Parook 
Coeff Std Dev T Value Beta Coef Beta’ Crat.R Beta R 
Coeff 
-0.0327 0.0084 -3.8937 -0.2771 0.0768 -0.1916 OF0531 
-0.0777 0.0442 -1.7587 -0.1339 0.0179 -0.2775 Oa0sdt 
0.2086 0.0404 5529 0.3994 091595 0.3990 On 593 
Aqueous - Temperature Centigrade 
Agueous - Ammonia-Nitrogen mgn/1 
Aqueous - Dissolved Silica mg/1 
Aqueous — Secchi Depth feet 
Aqueous - Dissolved Silica mg/l 
Standard Error of Estimate 03034 2 
Multiple Correlation Coefficient R Ob 55 Gare 0.3087) 
Goodness morerit, oF (5, 152) 1325769 
Constant Term 225335 
Coeff Std Dev T Value Beta Coef Beta Cr tt. R Beta R 
Coert 
-0.0420 0.0090 -4,.6598 -0.3559 041267 -0.1916 0.0682 
-0.0499 020435 -1.1456 -0.0859 0>0073 -0.2775 0.0238 
PhS 9 oh oFS) 0.0148 2.4488 0$ 4822 0.2326 -0.2515 -0.1213 
O.1158 020535 alto. 2. 022 24:33 0.0492 0.3990 0.0885 
-0.4504 ON 29.8 -3.4675 -0.7433 Den 525 -0.3355 0.2494 


Relative Toxicity K = 0.0 
Aqueous - Secchi Depth feet 
Standard Error of Estimate Oe32 G8 3 
Multiple Correlation Coefficient R OAS S2ZiaCR 0%. 2072) 
Goodness. of Fit, -£. (274155) 202 59's 
Constant Term ACY: By 
Coeff Std Dev T Value Beta Coef Beta” CrivtsR Beta R 
Coeff 
-0.0013 0.0004 -3.0645 -0.2343 0.0549 -0.3462 0.0811 
0.1650 00399 201329 053160 0.0998 073990 OWL 26) 
Aqueous —- Temperature Centigrade 
Relative Toxicity K = 0.0 
Aqueous - Secchi Depth feet 
Standard Error of Estimate SOe3 LL6 3 
Multiple Correlation Coefficient R OVS LPS CR 0.2616) 
Goodness of FIle, F438, 154) ESI 35 
Constant Term L27374 
Coeff Std Dev T Value Beta Coef Beta Cries Beta R 
Coeff 
-0.0287 0.0085 -3.3696 -0.2434 020592 -0.1916 0.0466 
-0.0010 0.0004 -2.3764 -0.1809 0.9324 -0.3462 0.0623 
0.1999 0.0400 4.9934 04.33:26 0.1464 0.3990 0.4527 
Aqueous - Temperature Centigrade 
Aqueous - Dissolved Silica mg/1 
Relative Toxicity K = 0.0 
Aqueous - Secchi Depth feet 
Aqueous - Dissolved Silica mg/1 
Standard Error of Estimate 053017 
Multiple Correlation Coefficient R 0256265-GR 023166) 
Goodness of Fit, F (5,. 152) 14.0846 
Constant Term 2.4879 
Coeff Std Dev T Value Beta Coef Betas CriGoar Beta R 
Coeff 
-0.0392 0.0091 -4.2806 tt pore A 0.1107 ~0.1916 0.0637 
05.05 31 0.0149 ZR2LIG 0.4396 O9933 =() 52k —~ORLOS 
-0.0007 0.0004 ol 7 55 7 =9 51313 04.0:0:7 2 -0.3462 0.0454 
051059 0.0538 159683 032027 0.0411 0.3990 0.0808 
-0.4280 02302 -~3.2878 -0.7064 0.4991 =0).o355 0.2370 
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are other underlying phenomena influencing microplankton spe- 
cies diversity that are not included in our data set. Alter- 
nately, it is quite possible that a significant amount of the 
residual variance is due to sampling error. Unlike the ben- 
thic animal diversity indices where information regarding sam- 
pling variations is available, there is no information regard- 
ing the magnitude of such errors in the MSDI. Since sampling 
error is a random phenomenon, such errors should be gaussian 
distributed. The x? test for the randomness of the residual 
MSDI indicates such a distribution. 

It is therefore unclear whether the residual variance 
in the MSDI is due to sampling error in the dependent and in- 
dependent variables or whether it points to other variables 
not included in the data base. No conclusion regarding 


either alternative can be presented at this time. 


B. The Zooplankton Species Diversity Index Equations 


The correlations between the log of zooplankton spe- 
cies diversity and the aqueous water quality variables are some- 


what better defined than those for the microplankton SDI. The 


se 


principal non-controllable variables are log of suspended sol- 
ids and log of temperature. The equations relating SDI to 
non-controllable factors are present in Table 12. 

Suspended solids is the primary variable, accounting 
for 18 percent of the ZSDI variance. The addition of log temp- 
erature and log pH increases this to 30 percent, while not sig- 
nificantly affecting the reliability of the equation coefficients. 
This is borne out by the t statistics, which are all signifi- 
cant at greater than the 95 percent level. Additionally, the 
F test for the goodness of fit of the equations is significant 
at the 0.5 percent level. 

A graphical summary of the variance accountable by 
each variable is presented in Figure 63. The major portion 
of the explained variance is due to log suspended solids in 
the one-and two-variable equations. However, in the three 
variable equation, pH is the dominant factor. This probably 
does not reflect a causative relationship between pH and ZSDI, 
because the range of pH is rather restrictive and not ina 
range where inhibition is likely. It is probably due to a 
strong spatial or temporal correlation between the two factors 
and is, therefore, a‘statistically valid parameter in the 


equation. 


=, doe = 


TABLE 12 


Aqueous - Suspended Solids mg/1 
Standard Error of Estimate 0.4645 5 
Multiple Correlation Coefficient R BNE al AT SE Sere a Tock Pa AS 
Goodness of Fit, F (1, 143) 3) 92.70 
Constant Term 0.6304 
Coeff Std Dev T Value Beta Coeff Babes Cz-teR Beta R 
Coeff 
-0.2786 Os050:L Be oak? -0.4216 Oralt ans -0.4216 Oe ie 
Aqueous - Temperature Centigrade 
Aqueous - Suspended Solids mg/1 
Standard Error of Estimate Ora 2 
Multiple Correlation Coefficient R OF 49113 CR 052414) 
Goodness of Fit, F (2, 142) 22.6016 
Constant Term B30 46 
Coeff Std Dev T Value Beta Coef Beta’ Ge ata s gees: Beta R 
Coeff 
+0.6649 NP1926 Se pews 0.2661 0.0708 Oe 5 0.0994 
-0.2227 0.0509 -4,.3709 -0.3369 Dielales 6 -0.4216 OMA 24. 
Aqueous - Temperature Centigrade 
Aqueous - pH units 
Aqueous - Suspended Solids mg/1 
Scantara GBrror ot Estimate 0.4328 3 
Multiple Correlation Coefficient R Oe A222. ER 0.2961) 
coodness. of Fit; F (37 141) 19.7766 
Constant Term 13.3636 
Coeff wo tOsley, T Value Beta Coef Beta Shere wi Beta R 
Coeff 
0.5866 OR Lot Bes os 022347 Or 055-h 033733 ORO 8/47 
56250 ELA seks ¥s) 3 00 OF2 745 0.90754 0.4425 On 25 
-0.1363 CSO 577 -2.4464 -0.2063 0.0426 024256 0.0870 


- 155 - 


SUSPENDED SOLIOS[__] TEMPERATURE E== 


FACTORS 
a 
O 


NON-CONTROLLABLE 
% OF ZSDI VARIANCE EXPLAINED 


= N 
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-0.279 
-0.223 


1.88 suspended solids 


0.262 suspended solids 
0.665 

Temperature 
pre: Suspended Salidsia. ee 


4 si ee Temperature ee 


FIGURE 63 


SUMMARY OF ZOOPLANKTON SPECIES DIVERSITY 
EQUATIONS —NON-CONTROLLABLE FACTORS 


= S356 = 


Plots of the residual zooplankton SDI for the log 
suspended solids-log temperature-log ZSDI equation versus log 
suspended solids, log temperature, log dissolved solids, log 
seechi’ depth, log* pH, log chlorosity, :log relative toxicity., 
and log nitrate-nitrogen are presented in Figures 64 through 
71. The dominant trends indicated are those in pH, nitrate- 


nitrogen, and conservative relative toxicity. 


The exhaustive search procedure led to the same con- 


clusions regarding the best controllable variables in the equa- 


tion. Nitrate-nitrogen was the best controllable factor, ac- 
counting for two-thirds of the total explainable variance. 
The statistics on these equations are presented in Table 13. 
The log suspended solids-log nitrate-nitrogen equation is the 
best structured equation, which includes a controllable fac- 
tor in it. It consists of only two variables, both of which 
have stable coefficients. The addition of any other control- 
lable variables to the equation results in less than a 2 per- 
cent increase in Bac 


Figure 73 presents a plot of the actual ZSDI versus 


the values that are predicted by this equation. 
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TABLE 13 


Aqueous - Suspended Solids mg/1 
Agueous - Nitrate-Nitrogen mgN/1 
Standard Error of Estimate 0.4051 2 
Multiple Correlation Coefficient R O65 674 ER 0.3789) 
Goodness of Fit, F (2, 142) 43.3284 
Constant Term -0.2399 
Coeff Std Dev T Value Beta Coef Beta“ Crercok Beta R 
Coeff 
-0.1738 0.0463 = 3 9) Oa. -0.2631 0.0692 -0.4216 OSELUS 
-0.4015 0.0592 -6.7821 -0.4757 Otae.26 3 -0.5634 0.2680 
Aqueous - Temperature Centigrade 
Aqueous - Suspended Solids mg/1 
Aqueous - Nitrate-Nitrogen mgN/1 
Standard Error of Estimate 0.4062 > 
Multiple Correlation Coefficient R 0.6164 (R 0.3800) 
Goodness of Fit, F (3, 141) ZOO le4 
Constant Term 0.3483 
Coeff Std Dev T Value Beta Coef Beta arose by page 54 Beta R 
Coeff 
-0.1097 OV2226 -0.4929 -0.0439 0.0019 6 bie I ace he -C.0164 
-0.1770 0.0469 re ty bel as -0.2679 0.9718 -0.4261 O22 050 
-0.4246 0.0756 -5.6142 -0.5030 Po 253T -0.5634 0.2834 
Aqueous - Temperature Centigrade 
Aqueous —- pH units 
Aqueous —- Suspended Solids mg/1 
Aqueous - Nitrate-Nitrogen mgN/1 
Standard Error of Estimate OAD 35 5 
Multiple Correlation Coefficient R OS 6'26:5774R 0.3925) 
Goodness of Fit, F (4, 140) 22 abies 3 
Constant Term -0.6244 
Coeff Std Dev T Value Beta Coef Beta Crvten Beta R 
Coeff 
-0.0637 02228 -0.2858 -0.0254 0.0006 Osa -0.0095 
2.96: 1.7491 1.6967 0.1398 G-.0195 0.4425 0.0618 
-0.1381 0.0519 -2.6588 -0.2090 0.0437 -0.4216 0.0881 
-0.3775 0.0801 -4.7130 -0.4472 0.2000 -0.5634 092520 
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The hypothesis that the residuals for this equation 
are gaussian distributed is significant at the 0.003 percent 
level indicating that there is still significant bias in the 
residuals. This result is presented in Figure 74. 

A graphical evaluation of residual correlation for 
this equation is presented in Figures 75 through 82. The plots 
are presented for ZSDI residual versus temperature, suspended 
solids, relative toxicity, and BOD... Note that there is an ap- 
parent lack of correlation between the residuals and any varia- 
ble other than conservative relative toxicity. 

However, when log conservative relative toxicity is 
entered into the equation as a second controllable factor there 
is no discernable increase in the goodness of the equation. 

The impact of conservative relative toxicity on the 
non-controllable factor ZSDI equation is indicated in Table 
14. Figure 83 shows the division of the explained variance due 
to each component variable. The effect of entering conserva- 
tive relative toxicity into the analysis is marginally signifi- 
cant from the standpoint of incremental variance removed. It 
only accounts for about one-fifth of the total variance explained 


by the equation. 
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TABLE 14 


Aqueous - Suspended Solids mg/1 
Relative Toxicity K = 0.0 
Standard Error of Estimate 0.4616 3 
Multiple Correlation Coefficient R 0.4401 (R OF L937) 
Coomness OTL Fit. Fi (2) 142) Te O56 
Constant Term OL 7331 
Coeff Std. Dev T Value Beta Coef Beta“ CLVeoR Beta R 
Coeff 
-0.2429 0.0541 -4,4830 -0.3676 OMB 5L -0.4216 0.1550 
-0.0746 0.0446 -1.6726 md ABS 7 Gal 0.0188 -0.2821 0.0386 
Aqueous - Temperature Centigrade 
Aqueous —- Suspended Solids mg/1 
Relative Toxicity K = 0.0 
Standard Error ofsEstimate 0.4300 
Multiple Correlation Coefficient R O53 5 26a CR. 023054) 
Goodness of Fit, F (3, 141) 20.6649 
Constant Term reoo52 
Coeff Std Dev T Value Beta Coef Beta“ Crit Beta R 
Coeff 
O29 672 Der ies, Ae G1), 0473.6. 74. 0.1499 O.89 33 0.1445 
-0.1185 0.0568 -2.0852 -0.1793 Org 0:33.22 -0.4216 0<.0:756 
-0.1645 Of0A4A5 7 -3.6025 -0.3021 0.0913 -0.2821 0.0852 
Aqueous - Temperature Centigrade 
Aqueous - pH units 
Aqueous - Suspended Solids mg/1 
Relative Toxicity K = 0.0 
Standard Error of Estimate OF 4935 5 
Multiple Correlation Coefficient R OF Sr/ Sue UR 055077) 
Goodness of Fit, F (4, 140) Lf 2994 
Constant Term Oia 4 
Coeff Std Dev T Value Beta Coef Beta“ CErreeur Beta R 
Coefft 
0.8441 Orne t AS OOK 033778 Cae ai Ono 33 OS112-611: 
4.2034 1.8244 2.3059 0.1980 020392 OS4425 0.0876 
-0.0791 OF 0585 1, O26 -0.1198 OO TAS -0.4216 050505 
-0.1282 0.0476 -2.6910 -0.2355 eo 5.4 -0.2821 0.0664 


SUSPENDED SOLIDS[_] TEMPERATURE F=4 
RELATIVE TOXICITY (0.0)[22=.] PH (TT 


AND RELATIVE TOXICITY 
%eo OF ZSDI VARIANCE EXPLAINED 


op) 
© 
oO 
= 
O 
< 
LL 
ui) 
Js 
FQ 
<x 
= 
aul 
Oo 
oO 
= 
=a 
O 
O 
1 
= 
2) 
a 


0.243 


relative toxieityent 


£0,967 


= 2.09 suspended solids 


= 0.136 suspended soled sy temperature 
0.165 


X relative toxicity 


= ee suspended solides a ad temperature 


a = 
X pH oe relative toxicity rs 


FIGURE 83 


SUMMARY OF ZOOPLANKTON SPECIES DIVERSITY EQUATIONS 
NON-CONTROLLABLE FACTORS AND CONSERVATIVE RELATIVE TOXICITY 


~ 1e0c— 


Another variable that was somewhat significant in ex- 
plaining variance in the log ZSDI is log ammonia-nitrogen. How- 
ever, its incremental contribution is only 3 percent greater 
than that provided by the non-controllable variable equations. 

As mentioned previously the magnitude of the x? sta- 
tistic indicates that there is some type of bias existing in 
the residuals. This suggests two possible conclusions: (1) 
there is a variable which has a significant effect on ZSDI that 
is not included in the data base; and (2) there is a strongly 
biased measurement error in the dependent variable, ZSDI. De- 
finitive statistical statements cannot be made regarding either 
possibility because of a lack of data on which to make the 
judgement. Apparently, the log ZSDI-log suspended solids, log 
nitrate-nitrogen equation is the best equation that the data 


can support. 
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VIII. PARTIAL CORRELATION ANALYSIS 


As pointed out in the previous sections the prob- 
lem of estimating the effect of relative toxicity on benthic 
diversity index does not exactly fit into the frameworks of 
classical regression analysis and its variants. The other 
available analysis framework is partial correlation analysis, 
discussed in section IV, which is based on the assumption 
that the data are samples from a multivariate normal density 
function. To check this assumption, the univariate histo- 
grams for each variable have been compared to a univariate 
normal density function using a chi-square goodness of fit 
test with ten equiprobable histogram intervals located at the 
decade points of the normal density function. The results 
of this test on both the variables themselves and their na- 
tural logs indicate that neither the variables nor their logs 
are normally distributed although the logs are closer. Since 
the marginal distributions are not all normal, the joint den- 
sity is not a multivariate normal density. (It should be noe 


ted that the converse of this statement: that normal marginal 


= be = 


distributions imply a normal multivariate distribution, is 
untrue). Thus the data set does not satisfy the assumptions 
implicit in the framework of partial correlation analysis and 
the degree to which this deviation affects the significance 
tests to be employed is unknown. 

The application of partial correlation analysis con- 
sists in calculating the partial correlation coefficients and 
their confidence limits for various pairs of variables while 
holding the effects of other sets of variables constant, (the 
variables whose effects are being held constant are called 
the conditional variables since the analysis is based on the 
conditional distribution). Ideally it would be desirable to 
include all the other available variables in the conditional 
set when investigating the partial correlation of a particular 
variable pair. However numerical difficulties encountered in 
inverting a covariance matrix of this size precluded this ap- 
proach. Instead the variables included in the conditional set 
were those which appeared to be important in the regression 
analyses as well as some additional variables to test their 


effect on the partial correlation. The results for the log 
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of the BSDI and conservative relative toxicity are shown in 
Table 15. The correlation coefficient between BSDI and con- 
servative relative toxicity is -0.12, which is significant at 
the 95 percent level, and negative, indicating an inverse re- 
lationship. When the effects of the important noncontrollable 
variables are removed [Table 15(b)] the partial correlation co- 
efficient increases in magnitude to -0.18. When the effect of 
BOD, is removed [Table 15(c)] essentially the same correlation 
coefficient is found. This is a suggestive result since it 
implies that the BSDI-conservative RT correlation is not re- 
lated to BOD . variations. A similiar result is found if the 
effects of the remaining controllable variables are removed 
[Table 15(d)]. If the variables themselves are used in the 
analysis the results are similar to the log variable results 
[Table 16]. 

For BOD . as the controllable variable the results of 
a similar set of partial correlation calculations are presen- 
ted in Table 17 for log variables and Table 18 for the variables 


themselves. Although there is a significant correlation be- 


tween log BSDI and log BOD ., it becomes insignificant when 
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TABLE 15 


BENTHIC SDI AND CONSERVATIVE RELATIVE TOXICITY 
LOG OF THE VARIABLES 


Conditioned Variables* Partial Correlation Coefficient 
(95% Confidence Limits) 


(a) None -0.12 (-0.21, -0.04) 


(b) Chlorosity -0..18 (-0.26, -—0.09) 
Sediment % Sand 
Sediment Organic Carbon 


(c) BOD, -0.16 (-0.24, -0.07) 


(d) Dissolved Oxygen Deficit = (rel Ome =O 20',))-0'. 0.6) 
Ammonia-nitrogen 
Nitrate-nitrogen 


*(c) includes variables in group (b) 
(d) includes variables in groups (b) and (c) 
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TABLE 16 


BENTHIC SDI AND CONSERVATIVE RELATIVE TOXICITY 
LINEAR VARIABLES 


Conditioned Variables* Partial Correlation Coefficient 
(95% Confidence Limits) ~— 


(a) None =O: 3t6.(=0 725 7-°=0:508) 


(b) Chlorosity =0 Dae 02 6 ry =0. 08) 
Sediment % Sand 
Sediment Organic Carbon 


(c) BOD, -0.27 (-0.35, -0.19) 


(d) Dissolved Oxygen Deficit -0.22 (-0.30, -0.13) 
Ammonia-nitrogen 
Nitrate-nitrogen 


*(c) includes variables in group (b) 
(d) includes variables in groups (b) and (c) 
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TABLE 17 


BENTHIC SDI AND BOD. 


LOG OF THE VARIABLES 


Conditioned Variables* Partial Correlation Coefficient 
")*(95% Confidence Limits) 


(a) Note =D Lt 08 205802 02) 
(>) ~“Chloros. ty -0.08 (-0.17, =-0.01) 
Sediment % Sand 


Sediment Organic Carbon 


(c) Conservative Relative 


MWe prerifeb lab, 0.03 (-0.06, 0.12) 


(d) Dissolved Oxygen Deficit 0.02 (-0.07, 0.11) 
Ammonia-nitrogen 
Nitrate-nitrogen 


*(c) includes variables in group (b) 
(d) includes variables in groups (b) and (c) 
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TABLE 18 


BENTHIC SDI AND BOD, 
LINEAR VARIABLES 


Conditioned Variables* Partial Correlation Coefficient 
(95% Confidence’ Limits): 


(a) None 7 =0:6 -(-022155 07.62) 
(b)> Chlorosity 0.0 (-0.09, 0.89) 
Sediment % Sand 


Sediment Organic Carbon 


(c) Conservative Relative 


rerretes 0.12- (0:02, 0.20) 


(d) Dissolved Oxygen Deficit 0.11 (0.02, 0.20) 
Ammonia- nitrogen 
Nitrate-nitrogen 


*(c) includes variables in group (b) 
(d) includes variables in groups (b) and (c) 
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conservative relative toxicity is added to the conditioned 
set. This result again implies that the significant relation- 
ship is between BSDI and conservative relative toxicity. 

The use of a nonconservative or reactive relative 
toxicity has been suggested as a possible controllable varia- 
ble. The partial correlation analysis for log variables, us- 
ing a relative toxicity distribution which decays following 
farsa order kinetics <(K = 0.1 day +), is presented in Table 
19. Initially with no conditioned variables there is a sig- 
nificant correlation between log BSDI and log reactive rela- 
tive toxicity (-0.27). However as the effects of the noncon- 
trollable variables are removed the partial correlation coeffi- 
cient drops and with the removal of the effect of BOD, the 
partial correlation coefficient becomes insignificant. Hence 
it appears that reactive relative toxicity is not correlated 
to BSDI if other variables are considered as well. 

A similar analysis for a more highly reactive rela- 
Civemcoxdtortys (K =, 02.2 day~1) is shown in Table 20. Again the 
correlation between log BSDI and log reactive RT becomes insig- 
nificant as the effects of the other significant variables are 


removed. 
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TABLE 19 
BENTHIC SDI AND REACTIVE RELATIVE TOXICITY 
LOG VARIABLES 
Chee 0 ols Day") 


Conditioned Variables* Partial Correlation Coefficient 
(95% Confidence Limits) 


(a) None =O 227° (=0235;, 27-0218) 


(b) Chlorosa ty. -0.13 (-0.22, -0.03) 
Sediment % Sand 
Sediment Organic Carbon 


(ce) BOD, =02 095 —-Uie 105 t= On 00) 


(d) Dissolved Oxygen Deficit -0.09 (-0.18, -0.90) 
Ammonia-nitrogen 
Nitrate- nitrogen 


*(c) includes variables in group (b) 
(d) includes variables in groups (b) and (c) 
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TABLE 20 


BENTHIC SDI ANDSREACTIVE: RELATIVE TOXICITY (K°= 0.2 pay~1) 
LOG VARIABLES 
Conditioned Variables Partial Correlation Coefficient 


(95% Confidence Limits) 
(a) None -0.28 (-0.36, -0.20) 


(b) Chlorosity —Oe125 (=0F195 °—-0-02) 
Sediment % Sand 
Sediment Organic Carbon 


(c) BOD. —Oe 07a (-0ln Ofe 0.02) 


(d) Dissolved Oxygen Deficit -0.07 (-0.16, 0.02) 
Ammonia-nitrogen 
Nitrate-nitrogen 


(e) Conservative Relative 


Toxicity O202—(—-0206;7 0712) 
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A relationship between relative toxicity and ammonia- 
nitrogen has recently been denoneeante dle By implication 
it might be expected that ammonia nitrogen should have some ef- 
fect on BSDI. That this is not the case is shown by the partial 
correlation analysis of log BSDI and log NH presented in Table 
21. The correlation coefficients are not significantly different 


from zero. 
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TABLE 21 


BENTHIC SDI AND AMMONIA NITROGEN 
LOG VARIABLES 


Conditioned Variables Partial Correlation Coefficient 
(95% Confidence Limits) 


(a) None =—9 302 (=0 sPT; 30507) 


(b) Chlorosity -0.03 (-0.12, 0.06) 
Sediment 2% Sand 
Sediment Organic Carbon 


(c) BOD, 0.001 (-0.09, 0.99) 


(d) Dissolved Oxygen Deficit -0.001 (-9.09, 0.09) 
nitrate nitrogen 


(d) Conservative Relative 


eee 0.05 (-0.04, 0.14) 
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APPENDIX A 


STATISTICAL SUMMARY OF DATA COLLECTED 
BY THE UNIVERSITY OF CALIFORNIA, BERKLEY 
1960-1964 
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APPENDIX B 


SPATIAL AND TEMPORAL VARIATIONS OF PERTINENT 
WATER QUALITY, SEDIMENT CHARACTERISTIC AND 
SPECIES DIVERSITY INDEX DATA 
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APPENDIX C 


DATA LISTING 


QUARTERLY AVERAGED DATA FILE 


Ps 
ep ale 
2 i) ae ~ i 
» aoa TEA ws | | 
gore eee ey ae Sad dio eae 
s #1, a ae ‘ OF MT ee oS ; “ag 
: A a, ip, ie : fi Sa, we. , - aoe 
~ i ee 2 Z res , a = wwe 
ae ok Oy a i Rees a ah eee 
TG ATAU CEDARS A VOReTsgS ; 
q 
“y % 
e th al “lek 
ffs ea 


Oe Be ar Are teas 1 i ig OF aed rae 


KEY TO DATA CONTAINED IN APPENDIX C 


Locations: 
SSXX - Suisun Bay 
SPXX — San Pablo Bay 
NBXX - North Bay 
CBXX - Central Bay 
LBXX - Lower Bay 
SBXX - South Bay 

Dates: Month/Day/Year 


Month refers to the Quarter Specified as follows: 


1 
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10 


First Quarter, January - March 
Second Quarter, April - June 
Third Quarter, July - September 


Fourth Quarter, October -— December 
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CALIFORNIA 


The Resources Agency 
STATE WATER RESOURCES CONTROL BOARD 
1416 Ninth Street, Sacramento, California 95814 


CALIFORNIA REGIONAL 
WATER QUALITY CONTROL BOARDS 


NORTH COAST REGION (1) 
Suite F, 2200 County Center Drive 


Santa Rosa, California 95401 ie gatas 


SAN FRANCISCO BAY REGION (2) 
1111 Jackson Street, Room 6040 
Oakland, California 94607 


CENTRAL COAST REGION (3) 
2238 Broad Street 
San Luis Obispo, California 93401 


LOS ANGELES REGION (4) 
Room 4027, 107 South Broadway 
Los Angeles, California 90012 


CENTRAL VALLEY REGION (5) 
3201 S Street 
Sacramento, California 95816 


FRESNO BRANCH OFFICE 

(Central Valley Region) 

3374 Shields Avenue, (P.O. Box 2188) 
Fresno, California 93719 


LAHONTAN REGION (6) 

1014 Blue Lake Avenue, Suite 3 

(P. O. Box 33829) 

South Lake Tahoe, California 95705 


COLORADO RIVER BASIN REGION (7) 
81-715 Highway 111 (P.O. Drawer 1) 
Indio, California 92201 


SANTA ANA REGION (8) 
6833 Indiana Avenue, Suite 1 


Riverside, California 92506 


SAN DIEGO REGION (9) 
6154 Mission Gorge Road, Suite 205 
San Diego, California 92120 
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